CN111382246A - Text matching method, matching device and terminal - Google Patents

Text matching method, matching device and terminal Download PDF

Info

Publication number
CN111382246A
CN111382246A CN201811640931.2A CN201811640931A CN111382246A CN 111382246 A CN111382246 A CN 111382246A CN 201811640931 A CN201811640931 A CN 201811640931A CN 111382246 A CN111382246 A CN 111382246A
Authority
CN
China
Prior art keywords
text
vector
matching
mapping
mapping function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811640931.2A
Other languages
Chinese (zh)
Other versions
CN111382246B (en
Inventor
熊友军
熊为星
廖洪涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ubtech Robotics Corp
Original Assignee
Ubtech Robotics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ubtech Robotics Corp filed Critical Ubtech Robotics Corp
Priority to CN201811640931.2A priority Critical patent/CN111382246B/en
Publication of CN111382246A publication Critical patent/CN111382246A/en
Application granted granted Critical
Publication of CN111382246B publication Critical patent/CN111382246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/527Centralised call answering arrangements not requiring operator intervention

Abstract

The invention relates to the technical field of natural language processing, and provides a text matching method, a text matching device, a text matching terminal and a computer-readable storage medium. The matching method comprises the following steps: acquiring a first text and a second text; acquiring a first vector corresponding to the first text and a second vector corresponding to the second text; calculating a vector product of the first vector and the second vector; mapping the vector product to a first feature vector according to a first mapping function; mapping the first vector into a second eigenvector according to a second mapping function; and determining the matching degree of the second text and the first text based on the first feature vector and the second feature vector. The method and the device are applied to the automatic customer service system, and can match the question text input by the user to a more accurate matched text, so that the user question can be answered accurately, and the user experience is improved.

Description

Text matching method, matching device and terminal
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a text matching method, a text matching device, a text matching terminal and a computer-readable storage medium.
Background
The traditional manual customer service is a typical labor-intensive industry, the working time is long, the working repeatability is high, and the labor cost and the management cost of an enterprise are improved. Therefore, there is a need for an intelligent customer service system capable of automatically answering questions of users to replace manual customer service.
In the prior art, keywords in a question text of a user question are obtained, an answer related to the question of the user question is searched through keyword matching, and the answer is returned to the user. However, the keyword only belongs to one local feature of the problem text input by the user, and a plurality of matching texts may be searched according to the local feature, so that the fine granularity of matching is not good enough, the accuracy is low, and the user experience is poor.
Disclosure of Invention
In view of the above, the present invention provides a text matching method, a matching device, a terminal, and a computer-readable storage medium, so as to solve the problems of poor fine granularity and low accuracy in text matching of the existing automatic customer service system.
A first aspect of an embodiment of the present invention provides a text matching method, including:
acquiring a first text and a second text;
acquiring a first vector corresponding to the first text and a second vector corresponding to the second text;
calculating a vector product of the first vector and the second vector;
mapping the vector product to a first feature vector according to a first mapping function;
mapping the first vector into a second eigenvector according to a second mapping function;
and determining the matching degree of the second text and the first text based on the first feature vector and the second feature vector.
A second aspect of an embodiment of the present invention provides a text matching apparatus, including:
a text acquisition unit for acquiring a first text and a second text;
the vector acquisition unit is used for acquiring a first vector corresponding to the first text and a second vector corresponding to the second text;
a vector calculation unit for calculating a vector product of the first vector and the second vector;
a first mapping unit for mapping the vector product into a first feature vector according to a first mapping function;
a second mapping unit, configured to map the first vector into a second eigenvector according to a second mapping function;
and the matching unit is used for determining the matching degree of the second text and the first text based on the first feature vector and the second feature vector.
A third aspect of the embodiments of the present invention provides a terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the text matching method according to any one of the above when executing the computer program.
A fourth aspect of embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method of matching text as described in any one of the above.
Compared with the prior art, the invention has the following beneficial effects:
the method comprises the steps of performing text representation on a first text and a second text by using a first vector and a second vector to obtain text characteristics of the first text and the second text; the vector product of the text features of the first mapping function and the second mapping function is mapped by utilizing the first mapping function, the matching features of the first text and the second text with better fine granularity are extracted, the first vector is mapped by utilizing the second mapping function, the shallow text feature of the first text is obtained, the weight adjustment can be carried out on the matching features of the first text and the second text by utilizing the shallow text feature, and the finally determined matching result has better accuracy. The method is applied to an automatic customer service system, and can match the question text input by the user to a more accurate matched text, so that the user question can be answered accurately, and the user experience can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of an implementation of a text matching method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an implementation of a text matching method according to another embodiment of the present invention;
fig. 3 is a schematic structural diagram of a text matching apparatus according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following description is made by way of specific embodiments with reference to the accompanying drawings.
Referring to fig. 1, it shows a flowchart of an implementation of the text matching method provided by the embodiment of the present invention, which is detailed as follows:
in step 101, a first text and a second text are obtained.
In the embodiment of the present invention, the first text may be a text corresponding to a question input by a user, the second text may be a text selected from a preset text library, and by calculating a matching degree between the first text and the second text, a text with a highest matching degree with the first text may be selected from the text library, and an associated sentence (e.g., an answer sentence corresponding to the text) of the text is returned to the user, so as to implement automatic answering of the question of the user.
In step 102, a first vector corresponding to the first text and a second vector corresponding to the second text are obtained.
In the embodiment of the present invention, the vector corresponding to the text refers to a feature representation of the text, specifically, the rest of words are mapped into word vectors according to a trained word vector model by performing word segmentation on the text and removing stop words (such as some simple connecting words, mood-assisting words, etc.), punctuation marks and nonsense words, and a set of the word vectors is a vector corresponding to the text.
Specifically, the Word vector generation can be realized by adopting a Word2vec model. When the method is applied to a customer service question-answering system, the common professional words of the customer service questions can be combined to retrain the model, and a word vector model meeting the requirements of the customer service question-answering system is obtained. Therefore, the generated word vector can not only adapt to customer service linguistic data, but also ensure the universality of common words.
In the embodiment of the present invention, the word segmentation result of the text may be mapped to a word vector with a specified dimension, for example, a word vector with 300 dimensions.
In step 103, a vector product of the first vector and the second vector is calculated.
In the embodiment of the invention, let the first vector be q1=(x1,x2,x3,...,xm) The second vector is q2=(y1,y2,y3,...,yn) Then, a vector product of the first vector and the second vector may be calculated according to the following vector product calculation formula:
Figure BDA0001931101340000041
wherein z is(0)The vector product obtained by the calculation is represented,
Figure BDA0001931101340000042
representing a cross product.
In step 104, the vector product is mapped to a first feature vector according to a first mapping function.
In the embodiment of the present invention, the vector product obtained by the above calculation may be mapped to a first feature vector according to a first mapping function. The mapping process may be understood as extracting the top K values of the vector product (for example, the value of K may be 10). Thus, the problem of converting the input problem with a long length into the problem of a fixed length can be realized while paying attention to the first K most key words.
In the embodiment of the present invention, the first mapping function may also be understood as a kernel function of a neural network, that is, the first mapping function corresponds to a neural network, and the first feature vector may be output by inputting the vector product obtained by the above calculation into the neural network. The parameters of the neural network are parameters of the first mapping function, the multilayer mapping relationship of the first mapping function may correspond to hidden layers of the neural network, the parameters of the neural network include weight matrices and bias vectors corresponding to the multilayer mapping relationships, and the parameters may be determined through pre-training.
The first mapping function comprises a multi-layer mapping relationship, and the mathematical representation of the first mapping function is as follows:
z(l)=relu(W(l)z(l-1)+b(l))
wherein, if the number of layers of the mapping relation included in the first mapping function is L, then L is 1,2(l)Representing the weight matrix corresponding to the l-th layer mapping, b(l)Representing the offset vector, z, corresponding to the layer i map(l-1)Representing the input corresponding to the l-th layer map, z(l)And representing the mapping output corresponding to the ith layer mapping, and relu representing the excitation mode of the mapping output.
In step 105, the first vector is mapped to a second eigenvector according to a second mapping function.
In the embodiment of the present invention, the first vector is mapped to the second feature vector by using a second mapping function, and the second mapping function is a single-layer mapping relationship, which can be understood as a shallow neural network, and is used for acquiring the shallow feature of the first vector. If the first vector is obtained by mapping the problem text input by the user, the second mapping function is equivalent to performing shallow mapping on the query problem of the user, and the weight of the matching features can be adjusted once based on the result of the shallow mapping, so that the matching features do not deviate from the semantics of the first text excessively.
Optionally, the mathematical representation of the second mapping function may be:
h=relu(Wpq1+bp)
where h represents the mapping output of the second mapping function, q1Representing the input of a second mapping function, WpA weight matrix representing a second mapping function, bpThe offset vector representing the second mapping function, relu representing the excitation pattern of the mapping output.
Optionally, the weight matrix and the bias vector corresponding to each layer of mapping relationship of the first mapping function, and the weight matrix and the bias vector of the second mapping function are obtained by training, and the training step includes:
acquiring training samples, wherein the training samples comprise positive samples and negative samples, and the matching degree of text pairs contained in the positive samples is greater than that of text pairs contained in the negative samples;
iteratively calculating the first mapping function and the second mapping function using the training samples;
calculating loss values of output results corresponding to positive samples and output results corresponding to negative samples in the training samples according to a preset loss function, performing gradient updating according to the loss values, and determining a weight matrix and an offset vector corresponding to each layer of mapping relation of the first mapping function and a weight matrix and an offset vector of the second mapping function.
In the embodiment of the present invention, a training sample may be pre-constructed, where the constructed training sample may include three texts, a text pair composed of a first text and a second text is a positive sample, a text pair composed of the first text and a third text is a negative sample, and a relevance of the text pair of the positive sample is greater than a relevance of the text pair of the negative sample.
In the embodiment of the present invention, the same steps as those of the above method for performing text matching are used to perform text matching on a positive sample and a negative sample, so as to obtain an output result (matching degree) of the positive sample and an output result (matching degree) of the negative sample, the two output results are compared, a loss value is calculated by using a preset loss function, gradient updating is performed according to the loss value, and a weight matrix and a bias vector corresponding to each layer of mapping relationship of the first mapping function, and a weight matrix and a bias vector of the second mapping function can be finally determined through multiple iterative computations.
Optionally, the obtaining the training sample includes:
collecting a text for training;
classifying the texts for training, and determining the category of each text;
determining the matching degree of each text according to the category of each text;
constructing a text triple (Q) based on the matching degree between the texts1,Q2,Q3) As training samples, wherein Q1And Q2Constituting a forward sample, Q2And Q3And forming negative samples, wherein the matching degree between the positive samples is greater than that between the negative samples.
In the embodiment of the invention, the text data corresponding to the customer service questions and answers can be collected in advance, the text data is divided according to the text category attribute, and each category can comprise a plurality of main questions and similar questions corresponding to the main questions.
In one embodiment, the text corresponding to each main question may include two levels of categories. For example, for a main question, the relevance of the main question and its similar questions may be set to maximum, and the main question and its class 1,2 may be set to maximumThe relevance of the main problems which are the same in each case is set to be next highest; the relevance of the main question and other questions is set to a minimum. Thus, a series (Q) can be constructed1,Q2,Q3) Of a triplet of Q1And Q2Has a correlation higher than Q2And Q3The degree of correlation of (c).
Exemplary, as known, Q1The problem of (B) is how to charge the robot, Q2Is Q1Similar problem of "the robot is charged by usb", so Q1And Q2The degree of correlation of (a) is set to 2; q3Class and Q of1The first and second classes of the same problem are: how long the robot has been in endurance, then Q1And Q3The degree of correlation of (a) is set to 1; q4And Q1The second class of (2) is different, and the problem is that: "what safety hazard the robot has", then Q1And Q4The degree of correlation of (a) is set to 0; for these four questions, the following four satisfactory triplets can be constructed: (Q)1,Q2,Q3),(Q1,Q2,Q4),(Q1,Q3,Q4),(Q2,Q3,Q4)。
Optionally, the loss function may be:
L(q1,q2,q3;Θ)=max(0,margin-s(q1,q2)+s(q1,q3))
wherein q is1,q2And q is3Respectively represent Q1、Q2And Q3The corresponding vector, margin, represents the preset similarity distance between positive and negative samples, s (q)1,q2) Representing the output result, s (q), for the forward sample1,q3) Indicating an output result corresponding to a negative sample, L (q)1,q2,q3(ii) a Theta) represents that the parameters of the matching and sorting model are theta and the input is (q)1,q2) And (q)1,q3) The corresponding loss value.
In one implementation, to increase the training speed, the Adam algorithm may also be selected to perform the gradient update.
In step 106, a matching degree of a second text with the first text is determined based on the first feature vector and the second feature vector.
In the embodiment of the present invention, a cross product operation of vectors may be performed on the first feature vector and the second feature vector, and an operation result may be used as a matching degree of the second text and the first text. The calculation formula may be as follows:
Figure BDA0001931101340000081
where s represents the degree of matching of the output, z(L)Representing the first feature vector and h the second feature vector.
It should be noted that the text matching method provided by the invention is developed on the basis of large-scale knowledge processing and can be applied to various industries. For example, the method can be applied to the technical industries of professional information processing, natural semantic understanding, informatization management, automatic answer of consultation questions, retrieval sequencing and the like. The text matching mode provided by the invention not only focuses on matching among words, but also focuses on weight factors of the words, so that the problem of poor retrieval quality caused by high matching degree of non-core words can be effectively solved, and the semantic relation among the words can be focused more by utilizing word vectors, thereby effectively improving the quality of returned results.
As can be seen from the above, the present invention obtains text features of a first text and a second text by performing text representation of the first text and the second text with a first vector and a second vector; the vector product of the text features of the first mapping function and the second mapping function is mapped by utilizing the first mapping function, the matching features of the first text and the second text with better fine granularity are extracted, the first vector is mapped by utilizing the second mapping function, the shallow text feature of the first text is obtained, the weight adjustment can be carried out on the matching features of the first text and the second text by utilizing the shallow text feature, and the finally determined matching result has better accuracy. The method is applied to an automatic customer service system, and can match the question text input by the user to a more accurate matched text, so that the user question can be answered accurately, and the user experience can be improved.
Fig. 2 shows a flowchart of an implementation of the text matching method according to another embodiment of the present invention, which is detailed as follows:
in step 201, a first text and category information of the first text are obtained.
In step 202, a text matching library corresponding to the first text is determined based on the category information.
In step 203, a second text matching the first text is selected from the text matching library.
In practical application, different text libraries can be created for different product categories, when a problem input by a user is received, a problem category option can be provided, the category of the input problem selected by the user is ended, the corresponding text library is determined according to the category of the input problem selected by the user, and text matching is performed on the text corresponding to the problem input by the user in the text library.
Optionally, the matching method further includes:
respectively determining the matching degree of each text in the text matching library and the first text;
and taking the text with the highest matching degree with the first text in the matching library as the matching text of the first text.
As can be seen from the above, the present invention obtains text features of a first text and a second text by performing text representation of the first text and the second text with a first vector and a second vector; the vector product of the text features of the first mapping function and the second mapping function is mapped by utilizing the first mapping function, the matching features of the first text and the second text with better fine granularity are extracted, the first vector is mapped by utilizing the second mapping function, the shallow text feature of the first text is obtained, the weight adjustment can be carried out on the matching features of the first text and the second text by utilizing the shallow text feature, and the finally determined matching result has better accuracy. The method is applied to an automatic customer service system, and can match the question text input by the user to a more accurate matched text, so that the user question can be answered accurately, and the user experience can be improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
The following are embodiments of the apparatus of the invention, reference being made to the corresponding method embodiments described above for details which are not described in detail therein.
Fig. 3 is a schematic structural diagram of a text matching apparatus provided in an embodiment of the present invention, and for convenience of description, only the parts related to the embodiment of the present invention are shown, which are detailed as follows:
as shown in fig. 3, the matching device 3 for text includes: a text acquisition unit 31, a vector acquisition unit 32, a vector calculation unit 33, a first mapping unit 34, a second mapping unit 35 and a matching unit 36.
A text acquisition unit 31 for acquiring a first text and a second text;
a vector obtaining unit 32, configured to obtain a first vector corresponding to the first text and a second vector corresponding to the second text;
a vector calculation unit 33 for calculating a vector product of the first vector and the second vector;
a first mapping unit 34, configured to map the vector product into a first feature vector according to a first mapping function;
a second mapping unit 35, configured to map the first vector into a second eigenvector according to a second mapping function;
a matching unit 36, configured to determine a matching degree of the second text with the first text based on the first feature vector and the second feature vector.
Optionally, the text matching apparatus 3 further includes:
a category information acquisition unit configured to acquire category information of the first text;
a text library determining unit configured to determine a text matching library corresponding to the first text based on the category information;
the text obtaining unit 31 is further configured to select a second text matching the first text from the text matching library.
Optionally, the matching unit 36 is further configured to determine matching degrees of the texts in the text matching library and the first text, respectively, and use a text in the matching library with the highest matching degree with the first text as the matching text of the first text.
Optionally, the first mapping function includes a multilayer mapping relationship, and the mathematical representation of the first mapping function is as follows:
z(l)=relu(W(l)z(l-1)+b(l))
wherein, if the number of layers of the mapping relation included in the first mapping function is L, then L is 1,2(l)Representing the weight matrix corresponding to the l-th layer mapping, b(l)Representing the offset vector, z, corresponding to the layer i map(l-1)Representing the input corresponding to the l-th layer map, z(l)Representing the mapping output corresponding to the ith layer mapping, and relu representing the excitation mode of the mapping output;
the mathematical representation of the second mapping function is:
h=relu(Wpq1+bp)
where h represents the mapping output of the second mapping function, q1Representing the input of a second mapping function, WpA weight matrix representing a second mapping function, bpThe offset vector representing the second mapping function, relu representing the excitation pattern of the mapping output.
Optionally, the weight matrix and the bias vector corresponding to each layer of mapping relationship of the first mapping function, and the weight matrix and the bias vector of the second mapping function are obtained through training, and the text matching device 3 further includes:
the training sample acquisition unit is used for acquiring training samples, wherein the training samples comprise positive samples and negative samples, and the matching degree of text pairs contained in the positive samples is greater than that contained in the negative samples;
a training unit for training the first mapping function and the second mapping function using the training samples; and calculating loss values of output results corresponding to positive samples and output results corresponding to negative samples in the training samples according to a preset loss function, performing gradient updating according to the loss values, and determining a weight matrix and a bias vector of each layer of the first mapping function and a weight matrix and a bias vector of the second mapping function.
Optionally, the text matching apparatus 3 further includes:
the text acquisition unit is used for acquiring texts for training;
a category determining unit, configured to classify the texts for training and determine a category to which each text belongs;
the training unit is specifically used for determining the matching degree of each text according to the category of each text; and constructing a text triple (Q) based on the matching degree between the texts1,Q2,Q3) As training samples, wherein Q1And Q2Constituting a forward sample, Q2And Q3And forming negative samples, wherein the matching degree between the positive samples is greater than that between the negative samples.
Optionally, the loss function is:
L(q1,q2,q3;Θ)=max(0,margin-s(q1,q2)+s(q1,q3))
wherein q is1,q2And q is3Respectively represent Q1、Q2And Q3The corresponding vector, margin, represents the preset similarity distance between positive and negative samples, s (q)1,q2) Representing the output result, s (q), for the forward sample1,q3) Indicating an output result corresponding to a negative sample, L (q)1,q2,q3(ii) a Theta) represents that the parameters of the matching and sorting model are theta, and the input is thetaIs (q)1,q2) And (q)1,q3) The corresponding loss value.
As can be seen from the above, the present invention obtains text features of a first text and a second text by performing text representation of the first text and the second text with a first vector and a second vector; the vector product of the text features of the first mapping function and the second mapping function is mapped by utilizing the first mapping function, the matching features of the first text and the second text with better fine granularity are extracted, the first vector is mapped by utilizing the second mapping function, the shallow text feature of the first text is obtained, the weight adjustment can be carried out on the matching features of the first text and the second text by utilizing the shallow text feature, and the finally determined matching result has better accuracy. The method is applied to an automatic customer service system, and can match the question text input by the user to a more accurate matched text, so that the user question can be answered accurately, and the user experience can be improved.
Fig. 4 is a schematic diagram of a terminal according to an embodiment of the present invention. As shown in fig. 4, the terminal 4 of this embodiment includes: a processor 40, a memory 41 and a computer program 42 stored in said memory 41 and executable on said processor 40. The processor 40 executes the computer program 42 to implement the steps in the above-mentioned embodiments of the text matching method, such as the steps 101 to 106 shown in fig. 1. Alternatively, the processor 40, when executing the computer program 42, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the units 31 to 36 shown in fig. 3.
Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 42 in the terminal 4. For example, the computer program 42 may be divided into a text acquisition unit, a vector calculation unit, a first mapping unit, a second mapping unit and a matching unit, each unit having the following specific functions:
a text acquisition unit for acquiring a first text and a second text;
the vector acquisition unit is used for acquiring a first vector corresponding to the first text and a second vector corresponding to the second text;
a vector calculation unit for calculating a vector product of the first vector and the second vector;
a first mapping unit for mapping the vector product into a first feature vector according to a first mapping function;
a second mapping unit, configured to map the first vector into a second eigenvector according to a second mapping function;
and the matching unit is used for determining the matching degree of the second text and the first text based on the first feature vector and the second feature vector.
The terminal 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is only an example of a terminal 4 and does not constitute a limitation of terminal 4 and may include more or less components than those shown, or some components in combination, or different components, for example, the terminal may also include input output devices, network access devices, buses, etc.
The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the terminal 4, such as a hard disk or a memory of the terminal 4. The memory 41 may also be an external storage device of the terminal 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) and the like provided on the terminal 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal 4. The memory 41 is used for storing the computer program and other programs and data required by the terminal. The memory 41 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal and method may be implemented in other ways. For example, the above-described apparatus/terminal embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A matching method of a text, the matching method comprising:
acquiring a first text and a second text;
acquiring a first vector corresponding to the first text and a second vector corresponding to the second text;
calculating a vector product of the first vector and the second vector;
mapping the vector product to a first feature vector according to a first mapping function;
mapping the first vector into a second eigenvector according to a second mapping function;
and determining the matching degree of the second text and the first text based on the first feature vector and the second feature vector.
2. The method for matching texts according to claim 1, wherein the obtaining the first text and the second text comprises:
acquiring a first text and category information of the first text;
determining a text matching library corresponding to the first text based on the category information;
and selecting a second text matched with the first text from the text matching library.
3. The method of matching text according to claim 2, further comprising:
respectively determining the matching degree of each text in the text matching library and the first text;
and taking the text with the highest matching degree with the first text in the matching library as the matching text of the first text.
4. The matching method for text according to any one of claims 1 to 3, wherein the first mapping function includes a multi-layer mapping relationship, and the mathematical representation of the first mapping function is:
z(l)=relu(W(l)z(l-1)+b(l))
wherein, if the number of layers of the mapping relation included in the first mapping function is L, then L is 1,2(l)Representing the weight matrix corresponding to the l-th layer mapping, b(l)Representing the offset vector, z, corresponding to the layer i map(l-1)Representing the input corresponding to the l-th layer map, z(l)Representing the mapping output corresponding to the ith layer mapping, and relu representing the excitation mode of the mapping output;
the mathematical representation of the second mapping function is:
h=relu(Wpq1+bp)
where h represents the mapping output of the second mapping function, q1Representing the input of a second mapping function, WpA weight matrix representing a second mapping function, bpThe offset vector representing the second mapping function, relu representing the excitation pattern of the mapping output.
5. The matching method of the text according to claim 4, wherein the weight matrix and the bias vector corresponding to the mapping relationship of each layer of the first mapping function, and the weight matrix and the bias vector of the second mapping function are obtained by training, and the training step includes:
acquiring training samples, wherein the training samples comprise positive samples and negative samples, and the matching degree of text pairs contained in the positive samples is greater than that of text pairs contained in the negative samples;
iteratively calculating the first mapping function and the second mapping function using the training samples;
calculating loss values of output results corresponding to positive samples and output results corresponding to negative samples in the training samples according to a preset loss function, performing gradient updating according to the loss values, and determining a weight matrix and an offset vector corresponding to each layer of mapping relation of the first mapping function and a weight matrix and an offset vector of the second mapping function.
6. The method for matching texts according to claim 5, wherein the obtaining training samples comprises:
collecting a text for training;
classifying the texts for training, and determining the category of each text;
determining the matching degree of each text according to the category of each text;
constructing a text triple (Q) based on the matching degree between the texts1,Q2,Q3) As training samples, wherein Q1And Q2Constituting a forward sample, Q2And Q3And forming negative samples, wherein the matching degree between the positive samples is greater than that between the negative samples.
7. The method of matching text according to claim 6, wherein the loss function is:
L(q1,q2,q3;Θ)=max(0,margin-s(q1,q2)+s(q1,q3))
wherein q is1,q2And q is3Respectively represent Q1、Q2And Q3The corresponding vector, margin, represents the preset similarity distance between positive and negative samples, s (q)1,q2) Representing the output result, s (q), for the forward sample1,q3) Indicating an output result corresponding to a negative sample, L (q)1,q2,q3(ii) a Theta) represents that the parameters of the matching and sorting model are theta and the input is (q)1,q2) And (q)1,q3) The corresponding loss value.
8. An apparatus for matching text categories, the apparatus comprising:
a text acquisition unit for acquiring a first text and a second text;
the vector acquisition unit is used for acquiring a first vector corresponding to the first text and a second vector corresponding to the second text;
a vector calculation unit for calculating a vector product of the first vector and the second vector;
a first mapping unit for mapping the vector product into a first feature vector according to a first mapping function;
a second mapping unit, configured to map the first vector into a second eigenvector according to a second mapping function;
and the matching unit is used for determining the matching degree of the second text and the first text based on the first feature vector and the second feature vector.
9. A terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method for matching text according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of matching text according to any one of claims 1 to 7.
CN201811640931.2A 2018-12-29 2018-12-29 Text matching method, matching device, terminal and computer readable storage medium Active CN111382246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811640931.2A CN111382246B (en) 2018-12-29 2018-12-29 Text matching method, matching device, terminal and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811640931.2A CN111382246B (en) 2018-12-29 2018-12-29 Text matching method, matching device, terminal and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111382246A true CN111382246A (en) 2020-07-07
CN111382246B CN111382246B (en) 2023-03-14

Family

ID=71215979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811640931.2A Active CN111382246B (en) 2018-12-29 2018-12-29 Text matching method, matching device, terminal and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111382246B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11314897B2 (en) * 2020-07-24 2022-04-26 Alipay (Hangzhou) Information Technology Co., Ltd. Data identification method, apparatus, device, and readable medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100179933A1 (en) * 2009-01-12 2010-07-15 Nec Laboratories America, Inc. Supervised semantic indexing and its extensions
CN107861944A (en) * 2017-10-24 2018-03-30 广东亿迅科技有限公司 A kind of text label extracting method and device based on Word2Vec
CN108170684A (en) * 2018-01-22 2018-06-15 京东方科技集团股份有限公司 Text similarity computing method and system, data query system and computer product
CN108628825A (en) * 2018-04-10 2018-10-09 平安科技(深圳)有限公司 Text message Similarity Match Method, device, computer equipment and storage medium
CN108710613A (en) * 2018-05-22 2018-10-26 平安科技(深圳)有限公司 Acquisition methods, terminal device and the medium of text similarity
CN109033156A (en) * 2018-06-13 2018-12-18 腾讯科技(深圳)有限公司 A kind of information processing method, device and terminal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100179933A1 (en) * 2009-01-12 2010-07-15 Nec Laboratories America, Inc. Supervised semantic indexing and its extensions
CN107861944A (en) * 2017-10-24 2018-03-30 广东亿迅科技有限公司 A kind of text label extracting method and device based on Word2Vec
CN108170684A (en) * 2018-01-22 2018-06-15 京东方科技集团股份有限公司 Text similarity computing method and system, data query system and computer product
CN108628825A (en) * 2018-04-10 2018-10-09 平安科技(深圳)有限公司 Text message Similarity Match Method, device, computer equipment and storage medium
CN108710613A (en) * 2018-05-22 2018-10-26 平安科技(深圳)有限公司 Acquisition methods, terminal device and the medium of text similarity
CN109033156A (en) * 2018-06-13 2018-12-18 腾讯科技(深圳)有限公司 A kind of information processing method, device and terminal

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11314897B2 (en) * 2020-07-24 2022-04-26 Alipay (Hangzhou) Information Technology Co., Ltd. Data identification method, apparatus, device, and readable medium

Also Published As

Publication number Publication date
CN111382246B (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
CN111078837B (en) Intelligent question-answering information processing method, electronic equipment and computer readable storage medium
TW201909112A (en) Image feature acquisition
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
CN109299280B (en) Short text clustering analysis method and device and terminal equipment
CN111382248B (en) Question replying method and device, storage medium and terminal equipment
CN110688452A (en) Text semantic similarity evaluation method, system, medium and device
WO2020063524A1 (en) Method and system for determining legal instrument
CN112632261A (en) Intelligent question and answer method, device, equipment and storage medium
CN111382243A (en) Text category matching method, text category matching device and terminal
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN110969005A (en) Method and device for determining similarity between entity corpora
CN111382246B (en) Text matching method, matching device, terminal and computer readable storage medium
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN112307048A (en) Semantic matching model training method, matching device, equipment and storage medium
CN111767710B (en) Indonesia emotion classification method, device, equipment and medium
CN115640378A (en) Work order retrieval method, server, medium and product
CN111400413B (en) Method and system for determining category of knowledge points in knowledge base
CN112597208A (en) Enterprise name retrieval method, enterprise name retrieval device and terminal equipment
CN113127617A (en) Knowledge question answering method of general domain knowledge graph, terminal equipment and storage medium
CN112163090A (en) Case-based classification method and terminal for legal referee documents
CN111611379A (en) Text information classification method, device, equipment and readable storage medium
CN110688472A (en) Method for automatically screening answers to questions, terminal equipment and storage medium
CN113434630B (en) Customer service evaluation method, customer service evaluation device, terminal equipment and medium
CN114706927B (en) Data batch labeling method based on artificial intelligence and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant