CN112287085B - Semantic matching method, system, equipment and storage medium - Google Patents

Semantic matching method, system, equipment and storage medium Download PDF

Info

Publication number
CN112287085B
CN112287085B CN202011230122.1A CN202011230122A CN112287085B CN 112287085 B CN112287085 B CN 112287085B CN 202011230122 A CN202011230122 A CN 202011230122A CN 112287085 B CN112287085 B CN 112287085B
Authority
CN
China
Prior art keywords
preset
question
model
recall
templates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011230122.1A
Other languages
Chinese (zh)
Other versions
CN112287085A (en
Inventor
许强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202011230122.1A priority Critical patent/CN112287085B/en
Publication of CN112287085A publication Critical patent/CN112287085A/en
Application granted granted Critical
Publication of CN112287085B publication Critical patent/CN112287085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a semantic matching method, a semantic matching system, semantic matching equipment and a semantic matching storage medium, wherein the semantic matching method comprises the following steps: receiving problem information, and preprocessing the problem information to generate a problem text; respectively inputting the question text into a preset recall model, and recalling a plurality of preset question templates similar to the question text; inputting a plurality of recalled preset question templates into a preset language model to generate sentence vectors corresponding to the preset question templates; acquiring a service scene corresponding to the problem information, and determining a trained fine-ranking model corresponding to the service scene; and inputting sentence vectors corresponding to each preset question template into a trained fine-ranking model corresponding to the question information, and matching the preset question template setting with the highest similarity to the question text with the question information. The application relates to the field of artificial intelligence and blockchain, and provides a semantic matching method with high budget timeliness. The intelligent city intelligent management system is also suitable for the fields of intelligent government affairs, intelligent medical treatment and the like, so that the construction of intelligent cities can be further promoted.

Description

Semantic matching method, system, equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a semantic matching method, system, computer device, and storage medium.
Background
In the prior art, man-machine interaction is realized by setting a computer program. Existing problem systems fall into two categories: an open domain question-answering system and a domain-specific question-answering system. The open field question-answering system relies on various ontologies and information in reality so as to deal with a wide variety of questions, particularly in terms of life. The domain-specific question-answering system only intelligently handles questions in a specific domain, such as questions related to music or questions related to weather forecast, etc. Compared with questions and answers in the open field, the training difficulty of the question and answer system in the specific field is smaller, the question and answer system in the specific field generally carries out similarity calculation on sentences input by users, matches the most similar sentences, and then obtains answers corresponding to the sentences. When a question-answering system in a specific field needs to be connected with a plurality of business scenes, the question-answering system is difficult to train and poor in operation timeliness due to the large number of preset question templates in the database.
Disclosure of Invention
The application mainly aims to provide a semantic matching method, a semantic matching system, computer equipment and a storage medium, and aims to solve the technical problem that a question-answering system with a plurality of business scenes needs to be connected, and the operation timeliness is poor.
In order to achieve the above object, the present application provides a semantic matching method, comprising the steps of:
receiving problem information, and preprocessing the problem information to generate a problem text;
respectively inputting the question text into a preset recall model, and recalling a plurality of preset question templates similar to the question text;
inputting the recalled plurality of preset problem templates into a preset language model to generate sentence vectors corresponding to the preset problem templates;
acquiring a service scene corresponding to the problem information, and determining a trained fine-ranking model corresponding to the service scene;
and inputting sentence vectors corresponding to the preset question templates into a trained refined model corresponding to the question information, so as to perform similarity sequencing on a plurality of preset question templates similar to the question text, and setting the preset question template with the highest similarity to the question text to be matched with the question information.
Optionally, the step of inputting the question text into a preset recall model respectively, and recalling a plurality of preset question templates similar to the question text includes:
inputting the problem text into at least two preset recall models, and outputting a preset number of recall results by each preset recall model, wherein a plurality of preset recall models are recall models trained by different preset rules respectively;
and determining a plurality of preset question templates similar to the question text according to recall results corresponding to the preset recall models.
Optionally, the step of inputting the question text into at least two preset recall models, and outputting a preset number of recall results by each preset recall model includes:
inputting the question text into a first preset recall model to generate a plurality of preset question templates similar to sentence patterns and phrases of the question text;
inputting the question text into a second preset recall model to generate a plurality of preset question templates similar to the semantics of the question text, wherein the first preset recall model and the second preset recall model are recall models respectively trained based on the same full database, and the full database comprises a plurality of preset questions.
Optionally, the step of inputting the recalled plurality of preset question templates into a preset language model, and generating sentence vectors corresponding to each preset question template includes:
respectively inputting the recalled preset problem templates into a preset language model, wherein the preset language model is a trained bert model deployed on the GUP;
the trained bert model outputs sentence vectors corresponding to each of the preset question templates.
Optionally, before the step of inputting the recalled plurality of preset question templates into the preset language model to generate sentence vectors corresponding to each preset question template, the method further includes:
acquiring first sample data, wherein the first sample data comprises a plurality of question samples corresponding to a plurality of business scenes and training texts corresponding to the question samples;
constructing the question samples and training texts corresponding to the question samples to form positive sample sentence pairs and negative sample sentence pairs, and generating training corpus;
and inputting the training corpus into a to-be-trained bert model for model training, and generating a trained bert model.
Optionally, the step of obtaining the service scenario corresponding to the problem information and determining the trained fine-ranking model corresponding to the service scenario includes:
acquiring second sample data, wherein the second sample data are a plurality of problem samples corresponding to a service scene;
inputting each problem sample of the second sample data into a trained bert model, outputting sentence vectors corresponding to each problem sample, and generating training samples according to the sentence vectors;
and inputting the training sample into a fine-ranking model to be trained for model training, and generating a trained fine-ranking model corresponding to the business scene corresponding to the second sample data.
Optionally, after the step of matching the preset question template with the highest similarity to the question text with the question information, the method further includes:
acquiring a preset answer corresponding to a preset question template matched with the question information;
and sending the preset answer to a sending end of the question information.
To achieve the above object, the present application further provides a semantic matching system, the system comprising:
the receiving module is used for receiving the problem information and preprocessing the problem information to generate a problem text;
the recall module is used for respectively inputting the question texts into a preset recall model and recalling a plurality of preset question templates similar to the question texts;
the sentence vector generation module is used for inputting the recalled plurality of preset problem templates into a preset language model to generate sentence vectors corresponding to the preset problem templates;
the fine-ranking model determining module is used for obtaining a service scene corresponding to the problem information and determining a trained fine-ranking model corresponding to the service scene;
the sorting module inputs sentence vectors corresponding to the preset question templates into a trained fine-ranking model corresponding to the question information, so as to sort the similarity of a plurality of preset question templates similar to the question text, and the preset question template with the highest similarity to the question text is set to be matched with the question information.
To achieve the above object, the present application also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the semantic matching method as described above.
To achieve the above object, the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the semantic matching method as described above.
According to the semantic matching method, the semantic matching system, the computer equipment and the storage medium, a preset recall model is adopted, so that a plurality of preset problem templates similar to a problem text can be screened in a database of the preset problem templates in advance, and then the plurality of similar preset problem templates are further processed, so that the operand of subsequent processing is reduced, and the operation timeliness is improved; the sentence vectors are generated by inputting the preset problem templates corresponding to different business scenes into the same preset language model, and then the preset problem templates corresponding to different business scenes are input into the fine-ranking models corresponding to the business scenes.
Drawings
FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a first embodiment of the semantic matching method of the present application;
FIG. 3 is a schematic diagram of a refinement flow of step S200 of the first embodiment of the semantic matching method according to the present application;
FIG. 4 is a schematic diagram of a refinement flow of step S300 in the first embodiment of the semantic matching method according to the present application;
FIG. 5 is a flow chart of a second embodiment of the semantic matching method of the present application;
FIG. 6 is a flow chart of a third embodiment of the semantic matching method of the present application;
FIG. 7 is a schematic diagram of functional blocks of the semantic matching system of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Referring to fig. 1, fig. 1 is a schematic hardware structure of a computer device according to various embodiments of the present application. The computer device comprises a communication module 01, a memory 02, a processor 03 and the like. Those skilled in the art will appreciate that the computer device illustrated in FIG. 1 may also include more or fewer components than shown, or may combine certain components, or may be arranged in different components. The processor 03 is connected to the memory 02 and the communication module 01, respectively, and a computer program is stored in the memory 02 and executed by the processor 03 at the same time.
The communication module 01 is connectable to an external device via a network. The communication module 01 can receive data sent by external equipment, and can also send data, instructions and information to the external equipment, wherein the external equipment can be electronic equipment such as a data management terminal, a mobile phone, a tablet personal computer, a notebook computer, a desktop computer and the like.
The memory 02 is used for storing software programs and various data. The memory 02 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (creating a target sub-process, a first monitoring sub-process, and a shared file corresponding to the instruction based on a parent process), and the like; the storage data area may store data or information created according to the use of the computer device, etc. In addition, memory 02 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
The processor 03, which is a control center of the computer device, connects respective parts of the entire computer device using various interfaces and lines, performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 02, and calling data stored in the memory 02, thereby performing overall monitoring of the computer device. The processor 03 may include one or more processing units; preferably, the processor 03 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 03. In one embodiment, the processor 03 includes a central processing unit (CPU, central processing unit) and a graphics processing unit (GPU, graphics Processing Unit), which is faster and more costly to operate than a CPU. The preset language model is deployed on the GPU, so that the calculation speed of sentence vectors can be increased.
Although not shown in fig. 1, the above-mentioned computer device may further include a circuit control module, where the circuit control module is used to connect with a mains supply, implement power control, and ensure normal operation of other components.
Those skilled in the art will appreciate that the computer device structure shown in FIG. 1 is not limiting of the computer device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
According to the above hardware structure, various embodiments of the method of the present application are presented.
Referring to fig. 2, in a first embodiment of the semantic matching method of the present application, the semantic matching method comprises the steps of:
step S100, receiving problem information, and preprocessing the problem information to generate a problem text;
the user inputs the problem information on the user side, and the user side sends the problem information to a server or a computer pre-stored with a computer program for realizing the semantic matching method so as to realize the problem information receiving. Because the problem information edited by the user has uncontrollability, in order to ensure the accuracy of the matching result, the problem information needs to be preprocessed, and the preprocessing specifically can include filtering punctuation marks, english letters, expression signs, exclamation words, titles and the like.
Step S200, respectively inputting the question text into a preset recall model, and recalling a plurality of preset question templates similar to the question text;
the preset question templates are preset by a person skilled in the art according to the business scene corresponding to the semantic matching method, and corresponding answers are set for the question templates, so that the computer can automatically reply to the questions raised by the user. For complex business scenes, a data center comprising a plurality of preset problem templates can be constructed by adopting a block chain technology and a knowledge graph technology, specifically, each preset problem template can be used as a node of the data center, a new node is added into the data center, each node in the full data center is required to vote so as to determine whether the node can be connected, and node consensus verification exceeding 51% indicates that the new node can be added into the data center; and matching the entity of the new node with the entity of other nodes one by one, establishing connection of the same entity among different nodes, and forming a unidirectional serial ring by the entities by each node of the data center. A preset recall model may perform a decentralised recall in the data center.
In the application, only one preset recall model is specifically set, and the preset recall model is obtained by training according to the full database and is used for recalling preset problem templates in the full database, screening out preset problem templates with the number similar to that of the problem texts, wherein the similarity to the problem texts can be calculated based on one or more rules such as semantics, sentence patterns, synonyms and the like. Of course, the number of the preset recall models may be plural, that is, each preset recall model is trained based on one rule of semantics, sentence patterns, synonyms, and the like.
Step S300, inputting a plurality of recalled preset question templates into a preset language model to generate sentence vectors corresponding to the preset question templates;
to increase budget efficiency, a preset language model may be deployed at the GPU. The preset language model is a model obtained by training based on preset problem templates possibly faced by each service scene, and is used for generating corresponding sentence vectors according to each preset problem template, so that the trained fine-ranking model in the step S400 is convenient to use.
Step S400, acquiring a service scene corresponding to the problem information, and determining a trained fine-ranking model corresponding to the service scene;
step S500, inputting sentence vectors corresponding to the preset question templates into a trained fine-ranking model corresponding to the question information, so as to sort the similarity of a plurality of preset question templates similar to the question text, and setting the preset question template with the highest similarity to the question text to be matched with the question information.
The fine-ranking model is used for comprehensively sequencing the similarity of the sentence vectors and the problem text according to the input sentence vectors so as to obtain a sentence vector with the highest similarity as an output result. The business scenes can be housing safety insurance business, bank card safety insurance business, pet safety insurance business and the like, and can be divided into various sub-businesses such as intelligent government affairs, intelligent medical treatment and the like, and as the number of business scenes is large, the difference among the business scenes is large, so that the training difficulty of uniformly training a corresponding fine-ranking model by each business is large, the operation amount of the training obtained fine-ranking model is large, the accuracy is low, and the operation timeliness is low. In order to improve the semantic matching accuracy corresponding to each business scene and reduce the model training difficulty, a trained fine-ranking model is arranged for each business scene in the application. As each fine-ranking model is only in butt joint with one business scene, the fine-ranking model is simple in structure and can be deployed on a CPU (central processing unit), and a plurality of fine-ranking models share one preset language model.
In the embodiment, a preset recall model is adopted, so that a plurality of preset problem templates similar to the problem text can be screened in advance in a database of the preset problem templates, and then the plurality of similar preset problem templates are further processed, so that the operand of subsequent processing is reduced, and the timeliness of operation is improved; the sentence vector is generated by inputting the preset problem templates corresponding to different business scenes into the same preset language model, and then the preset problem templates corresponding to different business scenes are input into the fine-ranking models corresponding to the business scenes; the intelligent city intelligent management system is applicable to the fields of intelligent government affairs, intelligent medical treatment and the like, and can further promote the construction of intelligent cities.
Further, referring to fig. 3, fig. 3 is a flowchart detailing a step S200 in a first embodiment of the semantic matching method according to the present application, based on the embodiment shown in fig. 2, in this embodiment, step S200, a question text is respectively input into a preset recall model, and recalling a plurality of preset question templates similar to the question text includes:
step S210, inputting the problem text into at least two preset recall models, wherein each preset recall model outputs a preset number of recall results, and a plurality of preset recall models are recall models trained by different preset rules respectively;
step S220, determining a plurality of preset question templates similar to the question text according to the recall results corresponding to the preset recall models.
In this embodiment, the number of preset recall models is two, and the two preset recall models are trained by using different preset rules, so that even if the same sample is used for training, the obtained preset recall models are different, and therefore, the recall results output by the problem text by using different preset recall models are different, and each preset recall model can output the recall results from different angles. Through setting up a plurality of default recall models to can be based on the recall result of different angles output, carry out further refined row according to the recall result of output again, thereby make the matching result that obtains accurate.
Further, step S210 includes:
step S211, inputting the question text into a first preset recall model to generate a plurality of preset question templates similar to the sentence patterns and the phrase patterns of the question text;
the first preset recall model may be a ES (elasticsearch) recall model, so that the similarity between the text and each preset question template is obtained through calculation through parameters such as segmentation and weight of each segmentation, and the preset number of preset question templates with higher similarity are obtained through sequencing according to the similarity. The preset number is 20 in this embodiment. The preset problem template is similar to the preset problem in terms of literal meaning, sentence pattern and the like. For example: the words of the mobile phone to be ensured are basically consistent with the words of the mobile phone to be ensured, and only sentence patterns are different.
Step S212, inputting the question text into a second preset recall model, and generating a plurality of preset question templates similar to the semantics of the question text, wherein the first preset recall model and the second preset recall model are recall models trained based on the same full-scale database respectively, and the full-scale database comprises a plurality of preset questions.
The second preset recall model may be specifically a semantic recall model, and is similar to the first preset recall model, and the similarity between the second preset recall model and the problem text and each preset problem template is obtained through calculation, and then the second preset recall model is ordered according to the similarity, so that a preset number of preset problem templates with higher similarity are obtained. Semantic similarity is specifically, for example, "commission" and "commission," which are semantically similar, but literally completely different.
The similar preset problem templates are recalled literally by setting the first preset recall model, and the similar preset problem templates are recalled semantically by setting the second preset recall model, so that the preset problem templates input into the preset language model are similar to the problem samples in multiple dimensions, and the matching result is more accurate.
Further, referring to fig. 4, fig. 4 is a schematic flow chart of a favorite process of step S300 of the first embodiment of the semantic matching method according to the present application, based on the embodiment shown in fig. 3, in this embodiment, step S300, a plurality of recalled preset question templates are input into a preset language model, and generating sentence vectors corresponding to each of the preset question templates includes:
step S310, respectively inputting the recalled preset problem templates into a preset language model, wherein the preset language model is a trained bert model deployed on the GUP;
step S320, the trained bert model outputs sentence vectors corresponding to each of the preset question templates.
The bert (Bidirectional Encoder Representations from Transformers) model is a model constructed by a bidirectional encoder based on a transducer, and the calculation speed can be improved by deploying the trained bert model on the GUP due to the fact that the trained bert model is deeper in calculation depth and larger in calculation amount.
Further, referring to fig. 5, fig. 5 is a flowchart illustrating a second embodiment of the semantic matching method according to the present application, based on the embodiment shown in fig. 4, in this embodiment, step S300, before inputting a plurality of recalled preset question templates into a preset language model to generate sentence vectors corresponding to each of the preset question templates, further includes:
step S610, acquiring first sample data, wherein the first sample data comprises a plurality of problem samples corresponding to a plurality of business scenes and training texts corresponding to the problem samples;
specifically, a person skilled in the art can set a problem sample for a problem that may be faced by each service scene, and also needs to set a similar training text for the problem sample, and set all problem sample sets and corresponding training text sets as first sample data, wherein the data of the problem sample and the training text each account for half.
Step S610, constructing the question samples and training texts corresponding to the question samples to form positive sample sentence pairs and negative sample sentence pairs, and generating training corpus;
specifically, each of the question samples and the training text are segmented so that the subsequent can be connected as corresponding sentence pairs. And constructing sentence pairs according to each question sample and training text, wherein the sentence pairs can specifically comprise positive sample sentence pairs and negative sample sentence pairs, the positive sample sentence pairs are similar relations between two sentences, and the negative sample sentence pairs are not similar relations between the two sentences.
Step S630, inputting the training corpus into the to-be-trained bert model for model training, and generating a trained bert model.
And inputting the training corpus into the to-be-trained bert model, carrying out multiple iterations, and updating the weight in the to-be-trained bert model for multiple times according to the obtained loss function and the like, thereby obtaining the trained bert model.
In this embodiment, the trained bert model is obtained by training according to problem samples corresponding to multiple service scenarios, that is, the trained bert model may process problem samples corresponding to each service scenario to generate corresponding word vectors.
Further, referring to fig. 6, fig. 6 is a flowchart illustrating a third embodiment of the semantic matching method according to the present application, based on the embodiment shown in fig. 5, in this embodiment, step S400 includes, before determining that the service scenario corresponds to the trained fine-ranking model, acquiring the service scenario corresponding to the problem information:
step S710, obtaining second sample data, wherein the second sample data is a plurality of problem samples corresponding to a service scene;
step S720, inputting each problem sample of the second sample data into the trained bert model, outputting sentence vectors corresponding to each problem sample, and generating training samples according to the sentence vectors;
step S730, inputting the training sample into the fine-pitch model to be trained to perform model training, and generating a trained fine-pitch model corresponding to the service scenario corresponding to the second sample data.
In this embodiment, the fine-pitch model to be trained may be an interactive fine-pitch model. The second sample data only comprises a problem sample corresponding to a certain service scene, namely the trained refined model obtained through training is only aimed at the service scene, and sentence vectors are obtained by inputting the problem sample into the trained bert model in advance, and then the sentence vectors are refined, so that each trained refined model is highly adaptive to the trained bert model, and the semantic matching effect is improved.
In an embodiment, step S500 further includes, after matching the preset question template setting with the highest similarity to the question text with the question information:
acquiring a preset answer corresponding to a preset question template matched with the question information;
and sending the preset answer to a sending end of the question information.
Each service scene can preset answers corresponding to preset questions in advance to generate a mapping relation between the preset questions and the answers, the corresponding answers can be queried through a determined preset question template matched with the question information, and the answers are fed back to a user to realize automatic question and answer.
Referring to fig. 7, the present application further provides a semantic matching system, including:
the receiving module 10 is used for receiving the problem information and preprocessing the problem information to generate a problem text;
a recall module 20 for inputting the question text into a preset recall model, respectively, and recalling a plurality of preset question templates similar to the question text;
the sentence vector generating module 30 inputs the recalled plurality of preset question templates into a preset language model to generate sentence vectors corresponding to the preset question templates;
the fine-ranking model determining module 40 acquires a service scene corresponding to the problem information and determines a trained fine-ranking model corresponding to the service scene;
the ranking module 50 inputs sentence vectors corresponding to the preset question templates into a trained refined ranking model corresponding to the question information, so as to rank similarity of a plurality of preset question templates similar to the question text, and sets the preset question template with highest similarity to the question text to be matched with the question information.
Further, the recall module 20 includes:
the recall unit inputs the problem text into at least two preset recall models, and each preset recall model outputs a preset number of recall results, wherein a plurality of preset recall models are recall models trained by different preset rules respectively;
and the aggregation unit is used for determining a plurality of preset question templates similar to the question text according to recall results corresponding to the preset recall models.
Further, the recall unit includes:
the first preset recall subunit is used for inputting the question text into a first preset recall model to generate a plurality of preset question templates similar to the sentence patterns and the phrase groups of the question text;
and the second preset recall subunit inputs the question text into a second preset recall model to generate a plurality of preset question templates similar to the semantics of the question text, wherein the first preset recall model and the second preset recall model are recall models respectively trained based on the same full database, and the full database comprises a plurality of preset questions.
Further, the sentence vector generating module 30 includes:
the first input unit is used for respectively inputting the recalled preset problem templates into a preset language model, wherein the preset language model is a trained bert model deployed on the GUP;
the first generation unit is used for outputting sentence vectors corresponding to the preset problem templates by the trained bert model.
Further, the semantic matching system further comprises a first training module, wherein the first training module is used for:
acquiring first sample data, wherein the first sample data comprises a plurality of question samples corresponding to a plurality of business scenes and training texts corresponding to the question samples;
constructing the question samples and training texts corresponding to the question samples to form positive sample sentence pairs and negative sample sentence pairs, and generating training corpus;
and inputting the training corpus into a to-be-trained bert model for model training, and generating a trained bert model.
Further, the semantic matching system further comprises a first training module, wherein the first training module is used for:
acquiring second sample data, wherein the second sample data are a plurality of problem samples corresponding to a service scene;
inputting each problem sample of the second sample data into a trained bert model, outputting sentence vectors corresponding to each problem sample, and generating training samples according to the sentence vectors;
and inputting the training sample into a fine-ranking model to be trained for model training, and generating a trained fine-ranking model corresponding to the business scene corresponding to the second sample data.
Further, the semantic matching system further comprises a matching module, wherein the matching module is used for:
acquiring a preset answer corresponding to a preset question template matched with the question information;
and sending the preset answer to a sending end of the question information.
The present application also proposes a computer-readable storage medium on which a computer program is stored. The computer readable storage medium may be at least one of a Memory 02 in the terminal of fig. 1, a ROM (Read-Only Memory)/RAM (Random Access Memory ), a magnetic disk, and an optical disk, and the computer readable storage medium includes a plurality of information for causing the terminal to perform the method according to the embodiments of the present application.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of embodiments, it will be clear to a person skilled in the art that the above embodiment method may be implemented by means of software plus a necessary general hardware platform, but may of course also be implemented by means of hardware, but in many cases the former is a preferred embodiment.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures disclosed herein or equivalent processes shown in the accompanying drawings, or any application, directly or indirectly, in other related arts.

Claims (6)

1. A semantic matching method, comprising the steps of:
receiving problem information, and preprocessing the problem information to generate a problem text;
respectively inputting the question text into a preset recall model, and recalling a plurality of preset question templates similar to the question text;
inputting the recalled plurality of preset question templates into a preset language model to generate sentence vectors corresponding to the preset question templates;
acquiring a service scene corresponding to the problem information, and determining a trained fine-ranking model corresponding to the service scene;
inputting sentence vectors corresponding to the preset question templates into a trained fine-ranking model corresponding to the question information, so as to sort the similarity of a plurality of preset question templates similar to the question text, and setting the preset question template with the highest similarity to the question text to be matched with the question information;
the step of inputting the question text into a preset recall model respectively, and recalling a plurality of preset question templates similar to the question text comprises the following steps:
inputting the problem text into at least two preset recall models, and outputting a preset number of recall results by each preset recall model, wherein a plurality of preset recall models are recall models trained by different preset rules respectively;
determining a plurality of preset question templates similar to the question text according to recall results corresponding to the preset recall models;
the step of inputting the question text into at least two preset recall models, and outputting a preset number of recall results by each preset recall model comprises the following steps:
inputting the question text into a first preset recall model to generate a plurality of preset question templates similar to sentence patterns and phrases of the question text;
inputting the question text into a second preset recall model to generate a plurality of preset question templates similar to the semantics of the question text, wherein the first preset recall model and the second preset recall model are recall models trained respectively based on the same full database, and the full database comprises a plurality of preset questions;
the step of inputting the recalled plurality of preset question templates into a preset language model, and generating sentence vectors corresponding to the preset question templates comprises the following steps:
respectively inputting the recalled preset problem templates into a preset language model, wherein the preset language model is a trained bert model deployed on the GUP;
outputting sentence vectors corresponding to each preset problem template by the trained bert model;
before the step of inputting the recalled plurality of preset question templates into the preset language model to generate sentence vectors corresponding to the preset question templates, the method further comprises the following steps:
acquiring first sample data, wherein the first sample data comprises a plurality of problem samples corresponding to a plurality of business scenes and training texts corresponding to the problem samples;
constructing the question samples and training texts corresponding to the question samples to form positive sample sentence pairs and negative sample sentence pairs, and generating training corpus;
and inputting the training corpus into a to-be-trained bert model for model training, and generating a trained bert model.
2. The semantic matching method according to claim 1, wherein the step of obtaining a business scenario corresponding to the problem information and determining that the business scenario corresponds to a trained fine-ranking model includes, before:
acquiring second sample data, wherein the second sample data are a plurality of problem samples corresponding to a service scene;
inputting each problem sample of the second sample data into a trained bert model, outputting sentence vectors corresponding to each problem sample, and generating training samples according to the sentence vectors;
and inputting the training sample into a fine-ranking model to be trained for model training, and generating a trained fine-ranking model corresponding to the business scene corresponding to the second sample data.
3. The semantic matching method according to claim 2, wherein after the step of matching the preset question template having the highest similarity to the question text with the question information, further comprises:
acquiring a preset answer corresponding to a preset question template matched with the question information;
and sending the preset answer to a sending end of the question information.
4. A semantic matching system for implementing the method of any one of claims 1 to 3, the semantic matching system comprising:
the receiving module is used for receiving the problem information and preprocessing the problem information to generate a problem text;
the recall module is used for respectively inputting the question texts into a preset recall model and recalling a plurality of preset question templates similar to the question texts;
the sentence vector generation module is used for inputting the recalled plurality of preset problem templates into a preset language model to generate sentence vectors corresponding to the preset problem templates;
the fine-ranking model determining module is used for obtaining a service scene corresponding to the problem information and determining a trained fine-ranking model corresponding to the service scene;
the sorting module inputs sentence vectors corresponding to the preset question templates into a trained fine-ranking model corresponding to the question information, so as to sort the similarity of a plurality of preset question templates similar to the question text, and the preset question template with the highest similarity to the question text is set to be matched with the question information.
5. A computer device, characterized in that it comprises a memory, a processor and a computer program stored on the memory and executable on the processor, which computer program, when being executed by the processor, implements the steps of the semantic matching method according to any of claims 1 to 3.
6. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the semantic matching method according to any of claims 1 to 3.
CN202011230122.1A 2020-11-06 2020-11-06 Semantic matching method, system, equipment and storage medium Active CN112287085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011230122.1A CN112287085B (en) 2020-11-06 2020-11-06 Semantic matching method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011230122.1A CN112287085B (en) 2020-11-06 2020-11-06 Semantic matching method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112287085A CN112287085A (en) 2021-01-29
CN112287085B true CN112287085B (en) 2023-12-05

Family

ID=74352135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011230122.1A Active CN112287085B (en) 2020-11-06 2020-11-06 Semantic matching method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112287085B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688246B (en) * 2021-08-31 2023-09-26 中国平安人寿保险股份有限公司 Historical problem recall method and device based on artificial intelligence and related equipment
CN113850384A (en) * 2021-09-30 2021-12-28 维沃移动通信有限公司 Model training method and device
CN114595697B (en) * 2022-03-14 2024-04-05 京东科技信息技术有限公司 Method, apparatus, server and medium for generating pre-labeled samples
CN115860012B (en) * 2022-05-25 2024-06-11 北京中关村科金技术有限公司 User intention recognition method, device, electronic equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740077A (en) * 2018-12-29 2019-05-10 北京百度网讯科技有限公司 Answer searching method, device and its relevant device based on semantic indexing
CN110263144A (en) * 2019-06-27 2019-09-20 深圳前海微众银行股份有限公司 A kind of answer acquisition methods and device
CN111177349A (en) * 2019-12-20 2020-05-19 厦门快商通科技股份有限公司 Question-answer matching method, device, equipment and storage medium
CN111368042A (en) * 2020-02-13 2020-07-03 平安科技(深圳)有限公司 Intelligent question and answer method and device, computer equipment and computer storage medium
CN111858859A (en) * 2019-04-01 2020-10-30 北京百度网讯科技有限公司 Automatic question-answering processing method, device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740077A (en) * 2018-12-29 2019-05-10 北京百度网讯科技有限公司 Answer searching method, device and its relevant device based on semantic indexing
CN111858859A (en) * 2019-04-01 2020-10-30 北京百度网讯科技有限公司 Automatic question-answering processing method, device, computer equipment and storage medium
CN110263144A (en) * 2019-06-27 2019-09-20 深圳前海微众银行股份有限公司 A kind of answer acquisition methods and device
CN111177349A (en) * 2019-12-20 2020-05-19 厦门快商通科技股份有限公司 Question-answer matching method, device, equipment and storage medium
CN111368042A (en) * 2020-02-13 2020-07-03 平安科技(深圳)有限公司 Intelligent question and answer method and device, computer equipment and computer storage medium

Also Published As

Publication number Publication date
CN112287085A (en) 2021-01-29

Similar Documents

Publication Publication Date Title
CN112287085B (en) Semantic matching method, system, equipment and storage medium
CN110301117B (en) Method and apparatus for providing response in session
JP2022153441A (en) Method and device for pre-training models, method and device for generating text, electronic device, storage medium, and computer program
CN113127624B (en) Question-answer model training method and device
CN110347802B (en) Text analysis method and device
CN112579733B (en) Rule matching method, rule matching device, storage medium and electronic equipment
CN113704428B (en) Intelligent inquiry method, intelligent inquiry device, electronic equipment and storage medium
CN110895656B (en) Text similarity calculation method and device, electronic equipment and storage medium
CN116541493A (en) Interactive response method, device, equipment and storage medium based on intention recognition
CN111767394A (en) Abstract extraction method and device based on artificial intelligence expert system
CN110795544A (en) Content search method, device, equipment and storage medium
CN117271736A (en) Question-answer pair generation method and system, electronic equipment and storage medium
Sharma et al. Review on Chatbot Design Techniques in Speech Conversation Systems
CN112307754A (en) Statement acquisition method and device
CN115269828A (en) Method, apparatus, and medium for generating comment reply
CN116913278B (en) Voice processing method, device, equipment and storage medium
CN117422067A (en) Information processing method, information processing device, electronic equipment and storage medium
CN117290478A (en) Knowledge graph question-answering method, device, equipment and storage medium
CN111401070B (en) Word meaning similarity determining method and device, electronic equipment and storage medium
CN116364054A (en) Voice synthesis method, device, equipment and storage medium based on diffusion
CN115795007A (en) Intelligent question-answering method, intelligent question-answering device, electronic equipment and storage medium
CN113821669B (en) Searching method, searching device, electronic equipment and storage medium
Ota et al. Proposal of open-ended dialog system based on topic maps
Jeyanthi et al. AI‐Based Development of Student E‐Learning Framework
CN117891927B (en) Question and answer method and device based on large language model, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant