CN113641793B - Retrieval system for long text matching optimization aiming at electric power standard - Google Patents

Retrieval system for long text matching optimization aiming at electric power standard Download PDF

Info

Publication number
CN113641793B
CN113641793B CN202110937101.1A CN202110937101A CN113641793B CN 113641793 B CN113641793 B CN 113641793B CN 202110937101 A CN202110937101 A CN 202110937101A CN 113641793 B CN113641793 B CN 113641793B
Authority
CN
China
Prior art keywords
bert
semantic
training
search term
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110937101.1A
Other languages
Chinese (zh)
Other versions
CN113641793A (en
Inventor
赵常威
钱宇骋
李坚林
潘超
甄超
朱太云
李森林
胡啸宇
吴正阳
吴杰
吴海峰
黄文礼
温招洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Nanrui Jiyuan Power Grid Technology Co ltd
Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd
State Grid Anhui Electric Power Co Ltd
Original Assignee
Anhui Nanrui Jiyuan Power Grid Technology Co ltd
Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd
State Grid Anhui Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Nanrui Jiyuan Power Grid Technology Co ltd, Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd, State Grid Anhui Electric Power Co Ltd filed Critical Anhui Nanrui Jiyuan Power Grid Technology Co ltd
Priority to CN202110937101.1A priority Critical patent/CN113641793B/en
Publication of CN113641793A publication Critical patent/CN113641793A/en
Application granted granted Critical
Publication of CN113641793B publication Critical patent/CN113641793B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a long text matching and optimizing retrieval system aiming at an electric power standard, which belongs to the field of text retrieval, and aims to solve the problem that how to effectively match retrieval words with the long text is a core when the length of each section in the electric power standard is more than 512 and the document retrieval of the electric power standard is established. The traditional TF-IDF and BM25 algorithms consider word dimension matching, do not consider matching degree and correlation of depth semantics, can cause matching similarity to have limitation, aim at the problem that the Mask operation of a single word level in the original BERT cannot learn the context of a domain professional vocabulary, and aim at the result of domain word segmentation to perform Mask operation of a continuous vocabulary segment level, so that the context of the vocabulary level is forced to be learned by a forced model, and the method has a certain effect on the improvement of Chinese retrieval tasks.

Description

Retrieval system for long text matching optimization aiming at electric power standard
Technical Field
The invention belongs to the field of text retrieval, and particularly relates to a retrieval system for long text matching optimization aiming at electric power standards.
Background
The electric power standard is a technical regulation and a technical management basis which are required to be observed jointly in the process of electric power construction and electric energy production, transformation, transmission, sales and use, the electric power standard is a mandatory standard, the electric energy production, transmission and sales are all completed at once in the moment, and the electric power system has great influence on the whole social life and production, so that the electric power system is required to have high reliability, stability and safety, and the electric power standard is a mandatory standard except for a few indication of 'can refer to and execute'.
When the length of each section in the power standard is larger than 512, how to effectively match the search word with the long text is a core problem when the power standard document is established, and the traditional TF-IDF and BM25 algorithm considers matching in word dimension and does not consider matching degree and relevance of depth semantics, so that matching similarity has limitation.
Disclosure of Invention
In order to solve the problems of the scheme, the invention provides a retrieval system for long text matching optimization aiming at power standards.
The aim of the invention can be achieved by the following technical scheme: a retrieval system for long text matching optimization aiming at electric power standard comprises a vocabulary extraction terminal, a pre-training BERT coding terminal, a vocabulary processing terminal and a semantic long text ordering terminal;
the pre-training BERT encoding terminal adopts two different pre-training BERT codes to encode the paragraph d and the corresponding search term q to obtain different vectors, the different vectors are expressed as d-vecor and q-vecor, and then cosine similarity is calculated for the two vectors to be used as a relevance score of the two vectors;
The vocabulary processing terminal internally comprises two models, namely a BERT pre-training language model adapting to the field and an unsupervised semantic similarity model adapting to the field.
Preferably, the vocabulary extraction terminal extracts documents and titles of all chapters in the power standard text as paragraphs and corresponding search terms, wherein the paragraphs are marked as d, and the corresponding search terms are marked as q.
Preferably, the pretrained BERT coding terminal comprises an expansion unit, and the search term corresponding to each document is expanded by the expansion unit.
Preferably, the domain-adapted BERT pre-trained language model forces the model to learn vocabulary-level context while eliminating NSP tasks.
Preferably, the field-adaptive unsupervised semantic similarity model adopts an unsupervised method to train two BERT unsupervised semantic similarity representation data for the search term q and the paragraph d respectively.
Preferably, the semantic long text ordering terminal is used for constructing a deep semantic long text ordering model to enable BERT suitable for q-d matching to be represented.
Preferably, the construction method of positive and negative samples for q-d matching algorithm is as follows:
Step one, aiming at the search term with the complete semantic relation, q-d pairs of other chapters are used, d parts of other chapters are used as negative examples, and paragraphs d matched with the search term q with the complete semantic relation are used as positive examples;
Step two, constructing word segmentation using the search term q without the complete semantic relation aiming at the search term without the complete semantic relation, wherein paragraphs d of other chapters are used as negative examples, and paragraphs d matched with the search term q without the complete semantic relation are used as positive examples;
In the training, in each batch, assuming that the size of the batch is batch_size, taking a positive example document paragraph corresponding to each search term as a positive sample, taking document paragraphs in other batch_size-1 samples as negative examples, and thus constructing a batch_size 2 sample pair for training;
preferably, the search terms and the BERT semantic similarity representation models of the document paragraphs are respectively constructed in a targeted mode and are respectively called q-BERT and d-BERT.
Preferably, the q-BERT and the d-BERT encode the search term and the document paragraph respectively as the initialized representation, the final objective is to learn the encoder d-encoder and the q-encoder, respectively encode the search term q and the document paragraph d into the same vector space, in the same vector space, (q, d) with strong correlation is closer to the distance of (q, d) pairs with weak correlation, and the following loss function is designed for this:
The loss function is a negative log-likelihood function of the positive example, where q i is the term, For the positive document paragraph with strong correlation,/>And the like are negative examples.
Compared with the prior art, the invention has the beneficial effects that:
1. Aiming at the problem that the Mask operation of the single word level in the original BERT cannot learn the context of the professional vocabulary in the field, the Mask operation of the continuous vocabulary segment level is carried out aiming at the result of the field segmentation, thereby forcing the model to learn the context of the vocabulary level and having a certain effect on the improvement of the Chinese retrieval task;
2. For the search term q and the paragraph d, respectively training two BERT unsupervised semantic similarity expression data by adopting an unsupervised method; the method has the advantages that one sentence is used for obtaining the BERT representation of the sentence through the encoder, the BERT representations obtained by other sentences are used as negative examples, the positive examples are respectively input into the encoder twice through the same sentence, and different drop mask mechanisms are used for obtaining different BERT representations, so that the method has better effects than the common text enhancement method cutting, word replacement and other methods;
3. In each batch, assuming that the batch size is batch_size, taking the positive document paragraph corresponding to each search term as a positive sample, taking the document paragraphs in other batch_size-1 samples as negative examples, and constructing a batch_size 2 sample pair for training, thereby effectively training a deep semantic long text ordering model.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic block diagram of the present invention;
FIG. 2 is a schematic block diagram of a similarity model of the present invention.
Detailed Description
The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a search system for long text matching optimization aiming at power standard comprises a vocabulary extraction terminal, a pre-training BERT coding terminal, a vocabulary processing terminal and a semantic long text ordering terminal;
The vocabulary extraction terminal output end is electrically connected with the pre-training BERT coding terminal input end, the pre-training BERT coding terminal output end is electrically connected with the vocabulary processing terminal input end, and the vocabulary processing terminal output end is electrically connected with the semantic long text ordering terminal input end;
The vocabulary extraction terminal extracts documents and titles of all chapters in the power standard text as paragraphs and corresponding search words, wherein the paragraphs are marked as d, and the corresponding search words are marked as q;
The pre-training BERT encoding terminal adopts two different pre-training BERT codes to encode the paragraph d and the corresponding search term q to obtain different vectors, the different vectors are expressed as d-vecor and q-vecor, and then cosine similarity is calculated on the two vectors to be used as a relevance score of the two vectors, wherein an expansion unit is arranged in the pre-training BERT encoding terminal, and the search term corresponding to each document can be expanded through the expansion unit, such as expansion through other topic keywords (such as a stable winding) existing in the pre-training BERT encoding terminal;
the vocabulary processing terminal internally comprises two models, namely a BERT pre-training language model adapting to the field and an unsupervised semantic similarity model adapting to the field;
The field-adaptive BERT pre-training language model has the following characteristics:
1. Aiming at the problem that the Mask operation of the single word level in the original BERT cannot learn the context of the professional vocabulary in the field, the Mask operation of the continuous vocabulary segment level is carried out aiming at the result of the field segmentation, thereby forcing the model to learn the context of the vocabulary level and having a certain effect on the improvement of the Chinese retrieval task;
2. The long text semantic information obtained by modeling is incomplete due to the fact that the next sentence prediction task in the original BERT is split, so that adverse effects are caused on long text retrieval, and NSP tasks are cancelled to avoid the effects on the long text semantic information.
The field-adaptive unsupervised semantic similarity model has the following characteristics:
For the search term q and the paragraph d, respectively training two BERT unsupervised semantic similarity expression data by adopting an unsupervised method;
As shown in fig. 2, a sentence is used to obtain the BERT representation of the sentence through the encoder, the BERT representations obtained by other sentences are used as negative examples, while the positive examples are that the same sentence is respectively input into the encoder twice, and different drop mask mechanisms obtain different BERT representations, so that experiments prove that the effect is better than that of the conventional text enhancement method such as clipping, word replacement and the like, the coding feature vector of each text in the diagram is represented by a first matching pair with the arrow in the rear direction, and the first matching pair with the second matching pair and the first matching pair with the third matching pair;
respectively constructing search terms in a targeted manner, and respectively calling BERT semantic similarity representation models of document paragraphs as q-BERT and d-BERT;
Through the training optimization of the power standard text in the semantic similarity calculation scene, BERT representation which is more suitable for q-d matching is obtained, and a deep semantic long text ordering model needs to be constructed.
The semantic long text ordering terminal is used for constructing a deep semantic long text ordering model, and the construction method of positive and negative samples aiming at the q-d matching algorithm is as follows:
Step one, aiming at the search term with the complete semantic relation, using other q-d pairs aiming at a certain section, using d parts of other sections as negative examples, and using the paragraph d matched with the search term q with the complete semantic relation as positive examples;
step two, constructing word segmentation using the search term q without the complete semantic relation aiming at the search term without the complete semantic relation, wherein paragraphs d of other chapters are taken as negative examples, and paragraphs d matched with the search term q without the complete semantic relation are taken as positive examples.
Step three, in the training process, in each batch, the batch is assumed to be batch_size, the positive example document paragraph corresponding to each search term is taken as a positive sample, and the document paragraphs in other batch_size-1 samples are taken as negative examples, so that a batch_size 2 sample pair is constructed for training, and a deep semantic long text ordering model is effectively trained.
Application and training is performed by the following steps:
Firstly, using d-encoder to code text paragraphs offline into vectors with fixed dimension, wherein the fixed dimension is 768 dimension, then establishing indexes for the vectors, in practical use, using q-encoder to code search words semantically into vectors with fixed dimension, and using CAISS vector search system to find the most relevant K document paragraphs, namely the application flow of the system, wherein the similarity of the code vectors of the search word q and the paragraph d is measured by using the following formula:
that is, a pair of (q, d) semantic code vectors is taken, and then an inner product is calculated as two correlation metrics, where T represents the transpose of the matrix.
In the training process, the q-BERT and the d-BERT obtained above are used for respectively encoding the search term and the document paragraph as initialization representations, the final objective is to learn the encoder d-encoder and the q-encoder, respectively encode the search term q and the document paragraph d into the same vector space, in the same vector space, the distance between the (q, d) pair with strong correlation and the (q, d) pair with weak correlation is closer, and finally, the following loss function is designed according to the task here:
The meaning represented by the above equation is a negative log likelihood function of the positive example, where q i is the term, For the positive document paragraph with strong correlation,/>In negative examples, the samples are other than the correctly matched samples.
The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas which are obtained by acquiring a large amount of data and performing software simulation to obtain the closest actual situation, and preset parameters and preset thresholds in the formulas are set by a person skilled in the art according to the actual situation or are obtained by simulating a large amount of data.
Working principle: aiming at the problem that the Mask operation of the single word level in the original BERT cannot learn the context of the professional vocabulary in the field, the Mask operation of the continuous vocabulary segment level is carried out aiming at the result of the field segmentation, thereby forcing the model to learn the context of the vocabulary level and having a certain effect on the improvement of the Chinese retrieval task;
For the search term q and the paragraph d, respectively training two BERT unsupervised semantic similarity expression data by adopting an unsupervised method; the method has the advantages that one sentence is used for obtaining the BERT representation of the sentence through the encoder, the BERT representations obtained by other sentences are used as negative examples, the positive examples are respectively input into the encoder twice through the same sentence, and different drop mask mechanisms are used for obtaining different BERT representations, so that the method has better effects than the common text enhancement method cutting, word replacement and other methods;
in each batch, assuming that the batch size is batch_size, taking the positive document paragraph corresponding to each search term as a positive sample, taking the document paragraphs in other batch_size-1 samples as negative examples, and constructing a batch_size 2 sample pair for training, thereby effectively training a deep semantic long text ordering model.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented; the modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the method of this embodiment.
It will also be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical method of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical method of the present invention may be modified or substituted without departing from the spirit and scope of the technical method of the present invention.

Claims (1)

1. The long text matching and optimizing retrieval system aiming at the electric power standard is characterized by comprising a vocabulary extraction terminal, a pre-training BERT coding terminal, a vocabulary processing terminal and a semantic long text ordering terminal;
The vocabulary extraction terminal output end is electrically connected with the pre-training BERT coding terminal input end, the pre-training BERT coding terminal output end is electrically connected with the vocabulary processing terminal input end, and the vocabulary processing terminal output end is electrically connected with the semantic long text ordering terminal input end; the vocabulary extraction terminal extracts documents and titles of all chapters in the power standard text as paragraphs and corresponding search words, wherein the paragraphs are marked as d, and the corresponding search words are marked as q; the field-adaptive unsupervised semantic similarity model respectively trains two BERT unsupervised semantic similarity representation data for the search term q and the paragraph d by adopting an unsupervised method;
the pre-training BERT encoding terminal adopts two different pre-training BERT codes to encode the paragraph d and the corresponding search term q to obtain different vectors, the different vectors are expressed as d-vecor and q-vecor, and then cosine similarity is calculated for the two vectors to be used as a relevance score of the two vectors;
the pre-training BERT coding terminal internally comprises an expansion unit, and the search word corresponding to each document is expanded through the expansion unit;
the vocabulary processing terminal internally comprises two models, namely a BERT pre-training language model adapting to the field and an unsupervised semantic similarity model adapting to the field;
The BERT pre-training language model of the field adaptation forces the model to learn the word level context, and meanwhile, the NSP task is canceled;
Respectively constructing search terms in a targeted manner, and respectively calling BERT semantic similarity representation models of document paragraphs as q-BERT and d-BERT; the semantic long text ordering terminal is used for constructing a deep semantic long text ordering model, so that BERT suitable for q-d matching is expressed;
The semantic long text ordering terminal is used for constructing a deep semantic long text ordering model, and the construction method of positive and negative samples aiming at the q-d matching algorithm is as follows:
step one, aiming at the search term with the complete semantic relation, q-d pairs of other chapters are used, d of other chapters is used as a negative example, and paragraph d matched with the search term q with the complete semantic relation is used as a positive example;
Step two, constructing word segmentation using the search term q without the complete semantic relation aiming at the search term without the complete semantic relation, wherein paragraphs d of other chapters are used as negative examples, and paragraphs d matched with the search term q without the complete semantic relation are used as positive examples;
in the training, in each batch, assuming that the size of the batch is batch_size, taking a positive example document paragraph corresponding to each search term as a positive sample, taking document paragraphs in other batch_size-1 samples as negative examples, and thus constructing a sample pair for training;
Using the q-BERT and d-BERT obtained above to encode the search term and the document paragraph respectively as initialization representations, the goal is to learn the encoders d-encoder and q-encoder, respectively encode the search term q and the document paragraph d into the same vector space, in the same vector space, (q, d) with strong correlation is closer to the distance of (q, d) pairs with weak correlation, and the following loss function is designed for this:
The loss function is a negative log-likelihood function of the positive example, where q i is the term, For the positive document paragraph with strong correlation,/>Negative examples.
CN202110937101.1A 2021-08-16 2021-08-16 Retrieval system for long text matching optimization aiming at electric power standard Active CN113641793B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110937101.1A CN113641793B (en) 2021-08-16 2021-08-16 Retrieval system for long text matching optimization aiming at electric power standard

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110937101.1A CN113641793B (en) 2021-08-16 2021-08-16 Retrieval system for long text matching optimization aiming at electric power standard

Publications (2)

Publication Number Publication Date
CN113641793A CN113641793A (en) 2021-11-12
CN113641793B true CN113641793B (en) 2024-05-07

Family

ID=78422036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110937101.1A Active CN113641793B (en) 2021-08-16 2021-08-16 Retrieval system for long text matching optimization aiming at electric power standard

Country Status (1)

Country Link
CN (1) CN113641793B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241321A (en) * 2018-07-19 2019-01-18 杭州电子科技大学 The image and model conjoint analysis method adapted to based on depth field
CN110516229A (en) * 2019-07-10 2019-11-29 杭州电子科技大学 A kind of domain-adaptive Chinese word cutting method based on deep learning
CN111931490A (en) * 2020-09-27 2020-11-13 平安科技(深圳)有限公司 Text error correction method, device and storage medium
CN112000805A (en) * 2020-08-24 2020-11-27 平安国际智慧城市科技股份有限公司 Text matching method, device, terminal and storage medium based on pre-training model
CN112527999A (en) * 2020-12-22 2021-03-19 江苏省农业科学院 Extraction type intelligent question and answer method and system introducing agricultural field knowledge
CN112749544A (en) * 2020-12-28 2021-05-04 苏州思必驰信息科技有限公司 Training method and system for paragraph segmentation model
CN113239700A (en) * 2021-04-27 2021-08-10 哈尔滨理工大学 Text semantic matching device, system, method and storage medium for improving BERT
CN113239148A (en) * 2021-05-14 2021-08-10 廖伟智 Scientific and technological resource retrieval method based on machine reading understanding

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8838433B2 (en) * 2011-02-08 2014-09-16 Microsoft Corporation Selection of domain-adapted translation subcorpora

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241321A (en) * 2018-07-19 2019-01-18 杭州电子科技大学 The image and model conjoint analysis method adapted to based on depth field
CN110516229A (en) * 2019-07-10 2019-11-29 杭州电子科技大学 A kind of domain-adaptive Chinese word cutting method based on deep learning
CN112000805A (en) * 2020-08-24 2020-11-27 平安国际智慧城市科技股份有限公司 Text matching method, device, terminal and storage medium based on pre-training model
CN111931490A (en) * 2020-09-27 2020-11-13 平安科技(深圳)有限公司 Text error correction method, device and storage medium
CN112527999A (en) * 2020-12-22 2021-03-19 江苏省农业科学院 Extraction type intelligent question and answer method and system introducing agricultural field knowledge
CN112749544A (en) * 2020-12-28 2021-05-04 苏州思必驰信息科技有限公司 Training method and system for paragraph segmentation model
CN113239700A (en) * 2021-04-27 2021-08-10 哈尔滨理工大学 Text semantic matching device, system, method and storage medium for improving BERT
CN113239148A (en) * 2021-05-14 2021-08-10 廖伟智 Scientific and technological resource retrieval method based on machine reading understanding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于神经网络的机器阅读理解综述;顾迎捷;桂小林;李德福;沈毅;廖东;;软件学报(07);全文 *
基于语义分布相似度的翻译模型领域自适应研究;姚亮;洪宇;刘昊;刘乐;姚建民;;山东大学学报(理学版)(07);全文 *

Also Published As

Publication number Publication date
CN113641793A (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN109840287B (en) Cross-modal information retrieval method and device based on neural network
Chung et al. Speech2vec: A sequence-to-sequence framework for learning word embeddings from speech
CN109918666B (en) Chinese punctuation mark adding method based on neural network
CN106502985B (en) neural network modeling method and device for generating titles
CN111259127B (en) Long text answer selection method based on transfer learning sentence vector
CN109344399B (en) Text similarity calculation method based on stacked bidirectional lstm neural network
CN107798140A (en) A kind of conversational system construction method, semantic controlled answer method and device
CN110619043A (en) Automatic text abstract generation method based on dynamic word vector
CN112883171B (en) Document keyword extraction method and device based on BERT model
CN113204611A (en) Method for establishing reading understanding model, reading understanding method and corresponding device
CN113051368B (en) Double-tower model training method, retrieval device and electronic equipment
CN114020906A (en) Chinese medical text information matching method and system based on twin neural network
CN109145946B (en) Intelligent image recognition and description method
CN113239666A (en) Text similarity calculation method and system
CN111221964B (en) Text generation method guided by evolution trends of different facet viewpoints
CN111428518B (en) Low-frequency word translation method and device
CN115033753A (en) Training corpus construction method, text processing method and device
CN116756303A (en) Automatic generation method and system for multi-topic text abstract
CN113392191B (en) Text matching method and device based on multi-dimensional semantic joint learning
CN116226357B (en) Document retrieval method under input containing error information
CN115828931B (en) Chinese and English semantic similarity calculation method for paragraph level text
CN117592563A (en) Power large model training and adjusting method with field knowledge enhancement
CN113641793B (en) Retrieval system for long text matching optimization aiming at electric power standard
CN115860002A (en) Combat task generation method and system based on event extraction
CN116662819A (en) Short text-oriented matching method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant