CN111611371A - Method, device, equipment and storage medium for matching FAQ based on wide and deep network - Google Patents

Method, device, equipment and storage medium for matching FAQ based on wide and deep network Download PDF

Info

Publication number
CN111611371A
CN111611371A CN202010555479.0A CN202010555479A CN111611371A CN 111611371 A CN111611371 A CN 111611371A CN 202010555479 A CN202010555479 A CN 202010555479A CN 111611371 A CN111611371 A CN 111611371A
Authority
CN
China
Prior art keywords
similarity
candidate
target
text
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010555479.0A
Other languages
Chinese (zh)
Other versions
CN111611371B (en
Inventor
胡哲杨
肖龙源
李稀敏
刘晓葳
廖斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN202010555479.0A priority Critical patent/CN111611371B/en
Publication of CN111611371A publication Critical patent/CN111611371A/en
Application granted granted Critical
Publication of CN111611371B publication Critical patent/CN111611371B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a method, a device, equipment and a storage medium for matching FAQ based on a wide and deep network, wherein the method comprises the following steps: respectively obtaining text representations of the target problem and the candidate problem, and respectively mapping the text representations of the target problem and the candidate problem into a first mapping vector of the target problem and a second mapping vector of the candidate problem; calculating first similarity of the first mapping vector and the second mapping vector, and calculating second similarity of sentence length difference, third similarity of the same word proportion and fourth similarity of the same word inverse proportion of the target question and the candidate question; and obtaining text similarity according to the first similarity, the second similarity, the third similarity and the fourth similarity, and performing FAQ matching according to the text similarity. According to the method, a more comprehensive similarity is obtained for FAQ matching by extracting the characteristics of the context information, sentence length difference, the number of the same words, the sequence of the same words and the like of the target problem and the candidate problem.

Description

Method, device, equipment and storage medium for matching FAQ based on wide and deep network
Technical Field
The invention relates to the field of natural language processing, in particular to a method, a device, equipment and a computer storage medium for matching FAQ based on a wide and deep network.
Background
The FAQ matching task is to give a group of frequently-used question-answer pairs, find whether a question in the question-answer pair matched with the target question exists or not for the given question, and if the candidate question exists, take the corresponding answer as the answer of a new question. At present, the most common method for solving the FAQ matching problem is to construct a twin network, obtain vector representation forms with context semantics corresponding to the candidate problem and the target problem, calculate the similarity of the two vectors, and select the candidate problem with the highest similarity as a matching item. However, the method does not consider the factors of sentence length difference, the same word quantity, the same word sequence and the like, so that the prediction effect is poor.
Disclosure of Invention
In view of the foregoing problems, an object of the present invention is to provide a method, an apparatus, a device, and a storage medium for FAQ matching based on a wide and deep network, in which the method, the apparatus, the device, and the storage medium extract the context information, sentence length difference, the number of the same words, the sequence of the same words, and other characteristics of the target problem and the candidate problem to obtain a more comprehensive similarity for FAQ matching.
The embodiment of the invention provides an FAQ matching method based on a wide and deep network, which comprises the following steps:
respectively obtaining text representations of a target problem and a candidate problem, and respectively mapping the text representations of the target problem and the candidate problem into a first mapping vector of the target problem and a second mapping vector of the candidate problem;
calculating a first similarity of the first mapping vector and the second mapping vector;
calculating a second similarity of sentence length differences of the target question and the candidate question;
calculating a third similarity of the target problem and the candidate problem in the same word proportion;
calculating a fourth similarity of the same word inverse number proportion of the target problem and the candidate problem;
and obtaining text similarity according to the first similarity, the second similarity, the third similarity and the fourth similarity, and performing FAQ matching according to the text similarity.
Preferably, the method includes obtaining text representations of the target problem and the candidate problem, and mapping the text representations of the target problem and the candidate problem into a first mapping vector of the target problem and a second mapping vector of the candidate problem, specifically:
respectively acquiring text representations of a target problem and a candidate problem;
segmenting and segmenting the text characterization of the target problem and the text characterization of the candidate problem, and mapping into a first mapping vector of the target problem and a second mapping vector of the candidate problem based on Word2Vec mapping and a TextCNN network.
Preferably, the second similarity of the sentence length differences of the target question and the candidate question is calculated, specifically:
calculating the absolute value of the word number difference between each candidate problem and the corresponding target problem;
dividing the absolute value by the sum of the times of the candidate question and the target question to obtain a sentence length difference rate of the candidate question and the target question;
and calculating a second similarity of sentence length differences of the target question and the candidate question based on the sentence length difference rate.
Preferably, the third similarity of the target problem and the candidate problem with the same word proportion is calculated, specifically:
calculating the number of words of each candidate problem which are the same as the corresponding target problem;
and dividing the number by the total word number of the target problem to be used as a third similarity of the candidate problem based on the same word proportion.
Preferably, the fourth similarity of the same word inverse number proportion of the target problem and the candidate problem is calculated, specifically:
calculating a set of words of which each candidate problem is the same as the corresponding target problem;
calculating the reverse order number of the same words of the candidate problem by taking the sequence of the same words in the target problem as a positive order;
dividing the inverse number by the maximum value of the possible values of the inverse number to serve as an inverse number ratio example of the candidate problem;
and calculating a fourth similarity of the inverse number ratios of the same words of the target problem and the candidate problem based on the inverse number ratios.
Preferably, the text similarity is obtained according to the first similarity, the second similarity, the third similarity and the fourth similarity, and FAQ matching is performed according to the text similarity, specifically:
splicing the first similarity, the second similarity, the third similarity and the fourth similarity into a vector;
splicing the vector into a full-connection layer with the number of nodes being 1 and the activation function being sigmoid so as to obtain the final text similarity;
and carrying out FAQ matching according to the text similarity.
In a second aspect, the present invention provides an FAQ matching apparatus based on wide and deep network, including:
the vector mapping unit is used for respectively acquiring text representations of a target problem and a candidate problem and mapping the text representations of the target problem and the candidate problem into a first mapping vector of the target problem and a second mapping vector of the candidate problem;
a first similarity obtaining unit, configured to calculate first similarities of the first mapping vector and the second mapping vector;
a second similarity obtaining unit, configured to calculate a second similarity of sentence length differences of the target question and the candidate question;
the third similarity obtaining unit is used for calculating third similarities of the target problem and the candidate problem with the same word proportion;
a fourth similarity obtaining unit, configured to calculate a fourth similarity of the same word inverse number ratios of the target problem and the candidate problem;
and the text similarity obtaining unit is used for obtaining text similarity according to the first similarity, the second similarity, the third similarity and the fourth similarity, and performing FAQ matching according to the text similarity.
Preferably, the vector mapping unit includes:
the text representation acquisition module is used for respectively acquiring text representations of the target problem and the candidate problem;
and the vector mapping module is used for segmenting and segmenting the text characterization of the target problem and the text characterization of the candidate problem, and mapping the segmented text characterization and the segmented text characterization into a first mapping vector of the target problem and a second mapping vector of the candidate problem based on Word2Vec mapping and a TextCNN network.
Preferably, the second similarity obtaining unit includes:
the absolute value calculation module is used for calculating the absolute value of the word number difference between each candidate problem and the corresponding target problem;
a sentence length difference rate obtaining module, configured to divide the absolute value by a sum of the number of times of the candidate question and the target question, so as to obtain a sentence length difference rate of the candidate question and the target question;
and the second similarity obtaining module is used for calculating second similarities of sentence length differences of the target question and the candidate question based on the sentence length difference rate.
Preferably, the third similarity obtaining unit includes:
the same word number calculation module is used for calculating the number of words of each candidate problem, which are the same as the corresponding target problem;
and the third similarity obtaining module is used for dividing the number by the total word number of the target problem to obtain a third similarity of the candidate problem based on the same word proportion.
Preferably, the fourth similarity obtaining unit includes:
the same word set calculation module is used for calculating a word set of each candidate problem which is the same as the corresponding target problem;
the inverse sequence number calculating module is used for calculating the inverse sequence number of the same words of the candidate problem by taking the sequence of the same words in the target problem as a positive sequence;
the reverse order ratio obtaining module is used for dividing the reverse order number by the maximum value of the possible value of the reverse order number to be used as the reverse order ratio of the candidate problem;
and the fourth similarity calculation module is used for calculating a fourth similarity of the same word inverse number proportion of the target problem and the candidate problem based on the inverse number proportion.
Preferably, the text similarity obtaining unit includes:
the vector acquisition module is used for splicing the first similarity, the second similarity, the third similarity and the fourth similarity into a vector;
the text similarity obtaining module is used for splicing the vectors into a full-connection layer with the node number being 1 and the activation function being sigmoid so as to obtain the final text similarity;
and the FAQ matching module is used for carrying out FAQ matching according to the text similarity.
The embodiment of the invention also provides FAQ matching equipment based on the wide and deep network, which comprises a processor, a memory and a computer program stored in the memory, wherein the computer program can be executed by the processor to realize the FAQ matching method based on the wide and deep network.
The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the method for writing data into a water card, which is used for preventing data loss, according to the above embodiment.
In the embodiment, on the basis of extracting the twin network of the context semantic information of the target problem and the candidate problem, the 3 types of characteristics of the sentence length difference, the same word quantity and the same word sequence of the target problem and the candidate problem are continuously extracted, so that a more effective text similarity comparison method can be obtained, and the prediction effect of the model is improved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating an FAQ matching method based on a wide and deep network according to a first embodiment of the present invention.
Fig. 2 is a schematic structural diagram of an FAQ matching apparatus based on the wide and deep network according to a second embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a first embodiment of the present invention provides a FAQ matching method based on a wide and deep network, which can be performed by a FAQ matching device based on a wide and deep network, and in particular, by one or more processors in the FAQ matching device based on a wide and deep network, and includes at least the following steps:
s101, respectively obtaining text representations of the target problem and the candidate problem, and respectively mapping the text representations of the target problem and the candidate problem into a first mapping vector of the target problem and a second mapping vector of the candidate problem.
In the embodiment, all question sentences in FAQ are extracted, Word segmentation is carried out, then Word2Vec is used for mapping as a matrix, then network extraction features such as TextCNN can be selected, finally each question is expressed as a vector, cosine similarity of a candidate question vector and a target question vector is calculated, and specifically, a target question and text representation of the candidate question are respectively obtained; and segmenting the text representation of the target problem and the text representation of the candidate problem, and mapping into a first mapping vector of the target problem and a second mapping vector of the candidate problem based on Word2Vec mapping and a TextCNN network.
S102, calculating a first similarity of the first mapping vector and the second mapping vector.
S103, calculating a second similarity of sentence length differences of the target question and the candidate question.
The step S103 includes:
s1031, calculating absolute values of word number differences of each candidate question and the corresponding target question;
s1032, dividing the absolute value by the sum of the times of the candidate question and the target question to obtain the sentence length difference rate of the candidate question and the target question;
s1032, calculating a second similarity of sentence length differences of the target question and the candidate question based on the sentence length difference rate.
Specifically, the second similarity of the sentence-length difference is 1-sentence-length difference rate.
And S104, calculating a third similarity of the target problem and the candidate problem with the same word proportion.
The step of S104 includes:
s1041, calculating the number of words of each candidate question which is the same as the number of words of the corresponding target question;
and S1042, dividing the number by the total word number of the target problem to be used as a third similarity of the candidate problem based on the same word proportion.
And S105, calculating a fourth similarity of the same word inverse number proportion of the target problem and the candidate problem.
The step S105 includes:
s1051, calculating the set of words of each candidate question which are the same as the corresponding target question;
s1052, taking the sequence of the same words in the target problem as a positive sequence, and calculating the reverse sequence number of the same words of the candidate problem;
s1053, dividing the inverse number by the maximum value of the possible values of the inverse number, and taking the maximum value as the inverse number ratio example of the candidate problem;
s1054, based on the inverse number proportion, calculating a fourth similarity of the inverse number proportion of the same word of the target question and the candidate question.
Specifically, the fourth similarity degree of the same term inverse number ratio is 1-inverse number ratio.
And S106, obtaining text similarity according to the first similarity, the second similarity, the third similarity and the fourth similarity, and performing FAQ matching according to the text similarity.
In this embodiment, the step S106 includes:
s1061, obtaining text similarity according to the first similarity, the second similarity, the third similarity, and the fourth similarity, and performing FAQ matching according to the text similarity, specifically:
s1062, splicing the first similarity, the second similarity, the third similarity and the fourth similarity into a vector;
s1063, splicing the vectors to form a full-connection layer with the number of nodes being 1 and the activation function being sigmoid so as to obtain the final text similarity;
and S1064, performing FAQ matching according to the text similarity.
Specifically, the deep and wide network is built, then, label data is used for training, and positive and negative examples of each question-answer pair are balanced during training. When the weight is initialized, the last full-connection layer may give a weight with a larger semantic similarity, and give a smaller value to the other 3 similarities, such as: the similarity weights may be set to 0.7,0.1,0.1, 0.1.
In conclusion, on the basis of extracting the twin network of the context semantic information of the target problem and the candidate problem, the 3 types of characteristics of the sentence length difference, the same word quantity and the same word sequence of the target problem and the candidate problem are continuously extracted, so that a more effective text similarity comparison method can be obtained, and the prediction effect of the model is improved.
The first embodiment of the present invention:
referring to fig. 2, a second embodiment of the present invention provides an FAQ matching apparatus based on a wide and deep network, including:
the vector mapping unit 100 is configured to obtain text representations of a target problem and a candidate problem, and map the text representations of the target problem and the candidate problem into a first mapping vector of the target problem and a second mapping vector of the candidate problem;
a first similarity obtaining unit 200, configured to calculate a first similarity between the first mapping vector and the second mapping vector;
a second similarity obtaining unit 300 for calculating a second similarity of sentence length differences of the target question and the candidate question;
a third similarity obtaining unit 400, configured to calculate a third similarity of the target problem and the candidate problem with the same word proportion;
a fourth similarity obtaining unit 500, configured to calculate a fourth similarity of the same word inverse number ratios of the target problem and the candidate problem;
a text similarity obtaining unit 600, configured to obtain text similarity according to the first similarity, the second similarity, the third similarity, and the fourth similarity, and perform FAQ matching according to the text similarity.
On the basis of the above embodiments, in a preferred embodiment of the present invention, the vector mapping unit 100 includes:
the text representation acquisition module is used for respectively acquiring text representations of the target problem and the candidate problem;
and the vector mapping module is used for segmenting and segmenting the text characterization of the target problem and the text characterization of the candidate problem, and mapping the segmented text characterization and the segmented text characterization into a first mapping vector of the target problem and a second mapping vector of the candidate problem based on Word2Vec mapping and a TextCNN network.
On the basis of the above embodiments, in a preferred embodiment of the present invention, the second similarity obtaining unit 300 includes:
the absolute value calculation module is used for calculating the absolute value of the word number difference between each candidate problem and the corresponding target problem;
a sentence length difference rate obtaining module, configured to divide the absolute value by a sum of the number of times of the candidate question and the target question, so as to obtain a sentence length difference rate of the candidate question and the target question;
and the second similarity obtaining module is used for calculating second similarities of sentence length differences of the target question and the candidate question based on the sentence length difference rate.
On the basis of the above embodiment, in a preferred embodiment of the present invention, the third similarity obtaining unit 400 includes:
the same word number calculation module is used for calculating the number of words of each candidate problem, which are the same as the corresponding target problem;
and the third similarity obtaining module is used for dividing the number by the total word number of the target problem to obtain a third similarity of the candidate problem based on the same word proportion.
On the basis of the foregoing embodiment, in a preferred embodiment of the present invention, the fourth similarity obtaining unit 500 includes:
the same word set calculation module is used for calculating a word set of each candidate problem which is the same as the corresponding target problem;
the inverse sequence number calculating module is used for calculating the inverse sequence number of the same words of the candidate problem by taking the sequence of the same words in the target problem as a positive sequence;
the reverse order ratio obtaining module is used for dividing the reverse order number by the maximum value of the possible value of the reverse order number to be used as the reverse order ratio of the candidate problem;
and the fourth similarity calculation module is used for calculating a fourth similarity of the same word inverse number proportion of the target problem and the candidate problem based on the inverse number proportion.
On the basis of the foregoing embodiment, in a preferred embodiment of the present invention, the text similarity obtaining unit 600 includes:
the vector acquisition module is used for splicing the first similarity, the second similarity, the third similarity and the fourth similarity into a vector;
the text similarity obtaining module is used for splicing the vectors into a full-connection layer with the node number being 1 and the activation function being sigmoid so as to obtain the final text similarity;
and the FAQ matching module is used for carrying out FAQ matching according to the text similarity.
Third embodiment of the invention:
the third embodiment of the invention also provides FAQ matching equipment based on the wide and deep network, which comprises a processor, a memory and a computer program stored in the memory, wherein the computer program can be executed by the processor to implement the FAQ matching method based on the wide and deep network described in the above embodiment.
The fourth embodiment of the present invention:
the fourth embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the method for writing data into a water card, which is used for preventing data loss, according to the foregoing embodiment.
Illustratively, the computer program may be divided into one or more units, which are stored in the memory and executed by the processor to accomplish the present invention. The one or more units may be a series of instruction segments of a computer program capable of performing specific functions, and the instruction segments are used for describing the execution process of the computer program in the FAQ matching device based on the wide and deep network.
The wide and deep network based FAQ matching device can include but is not limited to a processor and a memory. Those skilled in the art will appreciate that the schematic diagram is merely an example of the wide and deep network-based FAQ matching device, and does not constitute a limitation of the wide and deep network-based FAQ matching device, and may include more or less components than those shown, or combine some components, or different components, for example, the wide and deep network-based FAQ matching device may further include an input-output device, a network access device, a bus, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the control center of the wide and deep network based FAQ matching device connects the various parts of the entire wide and deep network based FAQ matching device using various interfaces and lines.
The memory may be used for storing the computer programs and/or modules, and the processor may implement various functions of the fade and deep network-based FAQ matching device by running or executing the computer programs and/or modules stored in the memory and calling the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the FAQ matching device integrated unit based on the wide and deep network can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A method for matching FAQ based on wide and deep network is characterized by comprising the following steps:
respectively obtaining text representations of a target problem and a candidate problem, and respectively mapping the text representations of the target problem and the candidate problem into a first mapping vector of the target problem and a second mapping vector of the candidate problem;
calculating a first similarity of the first mapping vector and the second mapping vector;
calculating a second similarity of sentence length differences of the target question and the candidate question;
calculating a third similarity of the target problem and the candidate problem in the same word proportion;
calculating a fourth similarity of the same word inverse number proportion of the target problem and the candidate problem;
and obtaining text similarity according to the first similarity, the second similarity, the third similarity and the fourth similarity, and performing FAQ matching according to the text similarity.
2. The method for FAQ matching based on the wide and deep network of claim 1, wherein text tables of a target problem and a candidate problem are obtained respectively, and the text representations of the target problem and the candidate problem are mapped into a first mapping vector of the target problem and a second mapping vector of the candidate problem respectively, specifically:
respectively acquiring text representations of a target problem and a candidate problem;
segmenting and segmenting the text characterization of the target problem and the text characterization of the candidate problem, and mapping into a first mapping vector of the target problem and a second mapping vector of the candidate problem based on Word2Vec mapping and a TextCNN network.
3. The method as claimed in claim 2, wherein the second similarity of sentence length differences of the target question and the candidate question is calculated as follows:
calculating the absolute value of the word number difference between each candidate problem and the corresponding target problem;
dividing the absolute value by the sum of the times of the candidate question and the target question to obtain a sentence length difference rate of the candidate question and the target question;
and calculating a second similarity of sentence length differences of the target question and the candidate question based on the sentence length difference rate.
4. The method as claimed in claim 3, wherein the third similarity of the target problem and the candidate problem with the same word proportion is calculated as follows:
calculating the number of words of each candidate problem which are the same as the corresponding target problem;
and dividing the number by the total word number of the target problem to be used as a third similarity of the candidate problem based on the same word proportion.
5. The method as claimed in claim 4, wherein the step of calculating the fourth similarity of the inverse number ratios of the same words of the target problem and the candidate problem comprises:
calculating a set of words of which each candidate problem is the same as the corresponding target problem;
calculating the reverse order number of the same words of the candidate problem by taking the sequence of the same words in the target problem as a positive order;
dividing the inverse number by the maximum value of the possible values of the inverse number to serve as an inverse number ratio example of the candidate problem;
and calculating a fourth similarity of the inverse number ratios of the same words of the target problem and the candidate problem based on the inverse number ratios.
6. The FAQ matching method based on the wide and deep network of claim 5, wherein text similarity is obtained according to the first similarity, the second similarity, the third similarity and the fourth similarity, and FAQ matching is performed according to the text similarity, specifically:
splicing the first similarity, the second similarity, the third similarity and the fourth similarity into a vector;
splicing the vector into a full-connection layer with the number of nodes being 1 and the activation function being sigmoid so as to obtain the final text similarity;
and carrying out FAQ matching according to the text similarity.
7. An FAQ matching device based on a wide and deep network, comprising:
the vector mapping unit is used for respectively acquiring text representations of a target problem and a candidate problem and mapping the text representations of the target problem and the candidate problem into a first mapping vector of the target problem and a second mapping vector of the candidate problem;
a first similarity obtaining unit, configured to calculate first similarities of the first mapping vector and the second mapping vector;
a second similarity obtaining unit, configured to calculate a second similarity of sentence length differences of the target question and the candidate question;
the third similarity obtaining unit is used for calculating third similarities of the target problem and the candidate problem with the same word proportion;
a fourth similarity obtaining unit, configured to calculate a fourth similarity of the same word inverse number ratios of the target problem and the candidate problem;
and the text similarity obtaining unit is used for obtaining text similarity according to the first similarity, the second similarity, the third similarity and the fourth similarity, and performing FAQ matching according to the text similarity.
8. The wide and deep network-based FAQ matching device as claimed in claim 7, wherein the vector mapping unit comprises:
the text representation acquisition module is used for respectively acquiring text representations of the target problem and the candidate problem;
and the vector mapping module is used for segmenting and segmenting the text characterization of the target problem and the text characterization of the candidate problem, and mapping the segmented text characterization and the segmented text characterization into a first mapping vector of the target problem and a second mapping vector of the candidate problem based on Word2Vec mapping and a TextCNN network.
9. A wide and deep network-based FAQ matching device, comprising a processor, a memory and a computer program stored in the memory, wherein the computer program is executable by the processor to implement the wide and deep network-based FAQ matching method as claimed in any one of claims 1 to 6.
10. A computer-readable storage medium, comprising a stored computer program, wherein when the computer program is executed, the computer-readable storage medium controls a device to execute the method for FAQ matching based on the wide and deep network according to any one of claims 1 to 6.
CN202010555479.0A 2020-06-17 2020-06-17 Method, device, equipment and storage medium for matching FAQ based on wide and deep network Active CN111611371B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010555479.0A CN111611371B (en) 2020-06-17 2020-06-17 Method, device, equipment and storage medium for matching FAQ based on wide and deep network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010555479.0A CN111611371B (en) 2020-06-17 2020-06-17 Method, device, equipment and storage medium for matching FAQ based on wide and deep network

Publications (2)

Publication Number Publication Date
CN111611371A true CN111611371A (en) 2020-09-01
CN111611371B CN111611371B (en) 2022-08-23

Family

ID=72205479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010555479.0A Active CN111611371B (en) 2020-06-17 2020-06-17 Method, device, equipment and storage medium for matching FAQ based on wide and deep network

Country Status (1)

Country Link
CN (1) CN111611371B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392184A (en) * 2021-06-09 2021-09-14 平安科技(深圳)有限公司 Method and device for determining similar texts, terminal equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536708A (en) * 2017-03-03 2018-09-14 腾讯科技(深圳)有限公司 A kind of automatic question answering processing method and automatically request-answering system
CN110516040A (en) * 2019-08-14 2019-11-29 出门问问(武汉)信息科技有限公司 Semantic Similarity comparative approach, equipment and computer storage medium between text
KR20190133931A (en) * 2018-05-24 2019-12-04 한국과학기술원 Method to response based on sentence paraphrase recognition for a dialog system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536708A (en) * 2017-03-03 2018-09-14 腾讯科技(深圳)有限公司 A kind of automatic question answering processing method and automatically request-answering system
KR20190133931A (en) * 2018-05-24 2019-12-04 한국과학기술원 Method to response based on sentence paraphrase recognition for a dialog system
CN110516040A (en) * 2019-08-14 2019-11-29 出门问问(武汉)信息科技有限公司 Semantic Similarity comparative approach, equipment and computer storage medium between text

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张琳等: "FAQ问答系统句子相似度计算", 《郑州大学学报(理学版)》 *
李姣: ""基于问答库的检索式问答系统研究与实现"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392184A (en) * 2021-06-09 2021-09-14 平安科技(深圳)有限公司 Method and device for determining similar texts, terminal equipment and storage medium
WO2022257455A1 (en) * 2021-06-09 2022-12-15 平安科技(深圳)有限公司 Determination metod and apparatus for similar text, and terminal device and storage medium

Also Published As

Publication number Publication date
CN111611371B (en) 2022-08-23

Similar Documents

Publication Publication Date Title
WO2018086401A1 (en) Cluster processing method and device for questions in automatic question and answering system
CN107273861A (en) A kind of subjective question marking methods of marking, device and terminal device
CN109558533B (en) Personalized content recommendation method and device based on multiple clustering
CN109117474B (en) Statement similarity calculation method and device and storage medium
CN109271641A (en) A kind of Text similarity computing method, apparatus and electronic equipment
CN112232346A (en) Semantic segmentation model training method and device and image semantic segmentation method and device
US11550996B2 (en) Method and system for detecting duplicate document using vector quantization
CN113706502B (en) Face image quality assessment method and device
CN109800292A (en) The determination method, device and equipment of question and answer matching degree
CN110969172A (en) Text classification method and related equipment
CN109885831B (en) Keyword extraction method, device, equipment and computer readable storage medium
CN111611371B (en) Method, device, equipment and storage medium for matching FAQ based on wide and deep network
CN110377708B (en) Multi-scene conversation switching method and device
CN114861635A (en) Chinese spelling error correction method, device, equipment and storage medium
CN111062440A (en) Sample selection method, device, equipment and storage medium
CN112040313B (en) Video content structuring method, device, terminal equipment and medium
CN111339778B (en) Text processing method, device, storage medium and processor
WO2022095370A1 (en) Text matching method and apparatus, terminal device, and storage medium
CN114723652A (en) Cell density determination method, cell density determination device, electronic apparatus, and storage medium
CN110264311B (en) Business promotion information accurate recommendation method and system based on deep learning
CN108681490B (en) Vector processing method, device and equipment for RPC information
CN114580354B (en) Information coding method, device, equipment and storage medium based on synonym
CN114757299A (en) Text similarity judgment method and device and storage medium
CN115048531A (en) Knowledge management method, device and system for urban physical examination knowledge
CN113934842A (en) Text clustering method and device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant