CN112861543A - Deep semantic matching method and system for matching research and development supply and demand description texts - Google Patents
Deep semantic matching method and system for matching research and development supply and demand description texts Download PDFInfo
- Publication number
- CN112861543A CN112861543A CN202110156093.7A CN202110156093A CN112861543A CN 112861543 A CN112861543 A CN 112861543A CN 202110156093 A CN202110156093 A CN 202110156093A CN 112861543 A CN112861543 A CN 112861543A
- Authority
- CN
- China
- Prior art keywords
- technical
- result
- text
- matching
- demand
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000012827 research and development Methods 0.000 title claims abstract description 22
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 42
- 238000012545 processing Methods 0.000 claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 25
- 230000006870 function Effects 0.000 claims abstract description 12
- 238000000605 extraction Methods 0.000 claims description 21
- 238000007781 pre-processing Methods 0.000 claims description 16
- 238000007477 logistic regression Methods 0.000 claims description 12
- 238000011161 development Methods 0.000 claims description 10
- 239000012141 concentrate Substances 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Abstract
The embodiment of the invention discloses a deep semantic matching method and a deep semantic matching system for matching research and development supply and demand description texts. The deep semantic matching method comprises the following steps: the technical requirement and technical result long text is concentrated into a text content abstract of no more than 512 characters by using an improved textrank algorithm; constructing a double-independent BERT pre-training language model, and respectively training titles and content summaries of two types of texts aiming at technical requirements and technical achievements; and linearly splicing the trained result, and processing the linearly spliced result by using a Softmax function weighted value considering the category weight as a semantic similarity coefficient. According to the technical scheme provided by the invention, the limitation of character input of the BERT pre-training language model 512 is avoided while the core semantics of the text content are not lost, the semantic information representing the technical requirements and the context of technical results is extracted to the greatest extent, the classification is more refined compared with the traditional coarse-grained semantic matching, and the automatic high-precision matching of the technical requirements and the text of the technical results is realized.
Description
Technical Field
The embodiment of the invention relates to the field of natural language processing, in particular to a deep semantic matching method and a deep semantic matching system for matching research and development supply and demand description texts.
Background
As an important carrier for interfacing research and development results with research and development demand, a scientific research and development technical result library and a technical research and development demand library are constructed on a plurality of scientific and technological resource sharing platforms, a large amount of text information describing the technical results and the research and development demand is stored, and the patent application is called as a research and development demand description text. The traditional online scientific and technological consultation service mostly adopts manual matching to complete the matching of research and development supply and demand description texts, and has the defects of high personnel investment, low matching efficiency and low matching success rate. How to realize deep and accurate matching of the semantics of the research and development supply and demand description text and the efficient matching of the power supply and demand parties are important technologies for creating intelligent scientific and technological consultation services and important guarantees for accelerating the market transformation of scientific achievements.
Different from a general text semantic matching scene, the semantic matching for developing the supply and demand description text has the following specificity: (1) the professional field is wide in relation, multiple in terms of nouns and strong in technical performance; (2) the text needing to be matched has a title and content, the title does not exceed 30 words, and the semantics is brief and refined; the contents are hundreds of characters different, and the semantics are complex and divergent.
The existing text semantic similarity matching technology is subject to the evolution from shallow machine learning such as TF-IDF, support vector machine, pagerank and the like to deep neural network learning such as word2vec, transformer and the like which considers context semantics. And performing semantic matching classification on the civil message data set of the local government inquiry platform by using a low-level machine learning algorithm and orienting to the public message text. And (3) combining word2vec word embedding with an AutoLMP model in image recognition by virtue of payment and the like, performing question-answer matching on the Quora data set, wherein an adopted algorithm technology cannot capture and solve the problem of word ambiguity or needs field prior knowledge input, and end-to-end automatic matching cannot be realized. The Neihao Hao et al designs an automatic semantic matching model of legal provisions and judicial interpretation based on a BERT pre-training language model, but does not solve the problem of 512 character number limitation of the input BERT model.
The technical scheme of text similarity semantic matching by word2vec word embedding and a shallow neural network model is adopted, and the algorithm model has the characteristic of shallow context semantic learning, is more suitable for general field texts with clear semantics and single connotation, and is difficult to be suitable for the development and demand description text contexts with complex semantics and numerous special terms. The existing algorithm for realizing deep semantic matching based on the BERT pre-training language model is limited by the BERT model only allowing 512-character input, and cannot carry out deep semantic matching on the long text content which is developed and required to describe and exceeds the 512-character limit.
Disclosure of Invention
The embodiment of the invention aims to provide a deep semantic matching method and a deep semantic matching system for matching research and development supply-demand description texts, which are used for solving the problems that the prior art cannot perform semantic similarity matching on a technical result-research and development demand long text exceeding 512 character limit, cannot respectively train a text title and a text content abstract, and cannot perform fine semantic matching classification.
In order to achieve the above object, the embodiments of the present invention mainly provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a deep semantic matching method and a deep semantic matching system method for matching research and development supply and demand description texts, including: respectively preprocessing the title and the content in the technical requirement and technical result texts to obtain a text title and a text content; extracting the text content of the preprocessed technical requirement by using a summary extraction algorithm to obtain a summary of the text content of the technical requirement; extracting the text content of the preprocessed technical result by using a summary extraction algorithm to obtain a summary of the text content of the technical result; inputting the text titles of the technical requirements and the technical achievements into a first BERT model to obtain a first similar category result; inputting the text content abstracts of the technical requirements and the technical achievements into a second BERT model to obtain a second similar classification result; linearly splicing the first similar classification result and the second similar classification result to obtain a third similar classification result; processing the third similarity classification result after linear splicing through a logistic regression algorithm to obtain a result-demand semantic similarity coefficient; and outputting the result-demand semantic identification degree coefficient to a technical result-technical demand semantic matching result according to a descending order.
Further, still include: when the title and the content in the technical requirement text and the technical result text are respectively preprocessed, punctuation marks are removed, and only Chinese characters are reserved.
Further, still include: and when the abstract extraction algorithm is used for abstracting the technical requirement text content and the technical result text content, the improved textrank algorithm is used for concentrating the long text content of the technical requirement and the technical result into a technical requirement text content abstract and a technical result text content abstract which do not exceed 512 characters respectively.
Further, still include: and when the logistic regression algorithm is used for processing the linearly spliced third similar category result, using a Softmax function weighted value considering category weight as a result-demand semantic similarity coefficient.
In a second aspect, an embodiment of the present invention further provides a deep semantic matching system for matching research and development supply and demand description texts, including: the preprocessing module is used for processing titles and contents in the technical requirements and technical result texts; the extraction module is used for extracting the core content abstract of the technical requirement and technical result text; the training module is used for respectively training the text titles and the text content abstracts of the technical requirements and the technical achievements to obtain a first similar category result and a second similar category result; the splicing module is used for linearly splicing the first similar category result and the second similar category result to obtain a third similar category result; the classification module is used for carrying out logistic regression algorithm processing on the linearly spliced third similarity classification results to obtain a result-demand semantic similarity coefficient; and the control processing module is used for controlling and processing text preprocessing, abstract extraction, model training, linear splicing, classification processing and descending output of titles and contents in the long text of the technical requirements and the technical achievements.
Furthermore, when the preprocessing module respectively preprocesses the title and the content in the technical requirement and the technical result text, punctuation marks are removed, and only Chinese characters are reserved.
Further, when the extraction module abstracts the technical requirement text content and the technical result text content by using the abstraction algorithm, the modified textrank algorithm is utilized to concentrate the technical requirement and technical result long text content into a text content abstract of no more than 512 characters.
Further, the classification module uses a Softmax function weighted value considering the class weight as an achievement-requirement semantic similarity coefficient when the logistic regression algorithm is used for processing the linearly pieced third similar classification result.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is used for executing one or more program instructions to execute the deep semantic matching method and system for developing and requiring description text matching according to the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium containing one or more program instructions, where the one or more program instructions are configured to be executed to implement the deep semantic matching method and system for developing and providing matching of description and supply text according to the first aspect.
The technical scheme provided by the embodiment of the invention at least has the following advantages:
the technical scheme provided by the embodiment of the invention concentrates the technical requirements and technical result long text into the text content abstract of no more than 512 characters by using the improved textrank algorithm, avoids the limitation of 512 character input of a BERT pre-training language model while not losing the core semantics of the text content; the method has the advantages that a double BERT pre-training language model is built and is trained respectively aiming at titles and content abstracts of two types of texts of technical requirements and technical achievements, semantic information representing the contexts of the technical requirements and the technical achievements is extracted to the maximum extent, and semantic matching accuracy of the two types of texts is improved remarkably; the traditional Softmax function classification output is improved into Softmax function weighted value output considering the class weight, model class prediction errors are corrected, and text semantic matching accuracy is improved. Compared with a semantic matching method based on word2vec word embedding, a text title semantic matching method based on BERT word embedding and a text title and content combined semantic matching method based on BERT word embedding, the method provided by the invention has the advantages that the value of the model F1 realized by the technical scheme is highest, and the method is obviously improved compared with three types of reference models.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, steps and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and the present invention is not limited to the conditions for implementing the present invention, so that the present invention has no technical significance, and any modification of the structures, changes of the step relation or adjustment should fall within the scope of the technical contents disclosed in the present invention without affecting the efficacy and the purpose which can be achieved by the present invention.
Fig. 1 is a flowchart of a deep semantic matching method and system for developing matching of supply and demand description texts according to an embodiment of the present invention.
Fig. 2 is a structural block diagram of a deep semantic matching system for developing and developing matching of supply and demand description texts according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a technical requirement-technical result text deep semantic matching algorithm provided by an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system algorithms, models, techniques, etc. in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, algorithms, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In the description of the present invention, it is to be understood that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Fig. 1 is a flowchart of a deep semantic matching method and system for developing matching of supply and demand description texts according to an embodiment of the present invention. As shown in fig. 1, a deep semantic matching method and system for matching development supply and demand description texts provided by the embodiment of the present invention includes:
s1: and respectively preprocessing the title and the content in the technical requirement text and the technical result text to obtain a text title and a text content.
Specifically, after the user inputs the text set of the technical requirements and the technical achievements into the deep semantic matching system, the deep semantic matching system preprocesses the text data set, wherein the preprocessing comprises splitting a text title and text contents, removing punctuation marks, only retaining Chinese characters, and obtaining the preprocessed text data set.
In one embodiment of the invention, the pre-processed text data set comprises a pre-processed text title and text content.
S2: extracting the text content of the preprocessed technical requirement by using a summary extraction algorithm to obtain a summary of the text content of the technical requirement; and extracting the text content of the preprocessed technical result by using a summary extraction algorithm to obtain a summary of the text content of the technical result.
Specifically, when core content extraction is carried out on the text content of the technical requirements and the technical achievements in the preprocessed text data set, the long text content of the technical requirements and the technical achievements is concentrated into a text content abstract of no more than 512 characters by using an improved textrank algorithm.
In one embodiment of the present invention, the title sentence P is first calculated in consideration of the sentence node weight adjustment of the similarity characteristic between the text content sentence and the title0With the content sentence PiDegree of similarity ω ofi0The formula is as follows:
secondly, traversing the feature words in each sentence, and if the feature words appear in the title, improving the word frequency weight; otherwise, the word frequency weight is kept unchanged. The calculation formula is as follows:
question sentence P in the above formula0The feature word vector of is P0=[k01,…,k0h′]TH' is the number of the feature words including the title and the sentence after expansion to obtain an adjustment matrix Dn*h。
Then according to formula (1) and matrix Dn*hCalculating the similarity between sentences to obtain a matrix SDn*nAnd the subsequent algorithm iteration process is identical to the textrank algorithm.
In one embodiment of the invention, only the first round of loop is calculated each time the loop is iterated, reducing the calculation time. Consider that the nodes in the text network graph eventually converge to a stable value through iteration, which is the final importance score of each node. The network graph information only exists in the link weight of the edge, the degree is recalculated in each loop only for accelerating the iteration process, and the relationship between the network graph information and the final convergence value is not large. Experiments show that the relation between the extracted abstract and the recalculated degree is not large, so that each iteration only needs to calculate one round of cycle, and the degree of each point is not changed in the iteration. By the optimization, the convergence process of the original o (n ^2) complexity is optimized to the o (n) complexity, and the iterative computation time is greatly reduced.
In one embodiment of the invention, the improved algorithm is applied to the input technical requirement and technical result text content setting parameters (the number of extracted sentences of the technical requirement text content is 12, and the number of extracted sentences of the technical result text content is 10), and the technical requirement and the technical result text content abstract are output.
S3: inputting the text titles of the technical requirements and the technical achievements into a first BERT model to obtain a first similar category result; and inputting the text content abstracts of the technical requirements and the technical achievements into a second BERT model to obtain a second similarity classification result.
Specifically, the preprocessed text titles of the two types of texts with the technical requirements and the technical achievements and the text content summaries of no more than 512 characters are respectively input into two independent BERT pre-training language models for training respectively to obtain a first similar category result and a second similar category result.
In one embodiment of the invention, the text titles of the technical requirements and the technical achievements are input into a BERT pre-training language model for training to obtain a first similar category result; and inputting the text content abstracts of the technical requirements and the technical achievements into another BERT pre-training language model for training to obtain a second similar classification result.
In an embodiment of the invention, aiming at the complex feature of the semantic of the development supply and demand description text, the patent application constructs a dual-independent BERT model architecture, matches technical results-technical requirements titles, matches technical results-technical requirements content abstracts, loads BERT Chinese pre-training vectors (with 768 dimensions of hidden layers, a 12-head mode, total 110M parameters, a maximum sequence length of 512, and a train _ batch _ size of 64) to perform fine-tuning training on training set data, and a text deep semantic matching algorithm schematic diagram is shown in fig. 3.
S4: and linearly splicing the first similar classification result and the second similar classification result to obtain a third similar classification result.
Specifically, the first similar classification result and the second similar classification result are linearly combined according to a certain weight, and the obtained third similar classification result is linearly combined according to a certain weight.
S5: and processing the third similarity classification result after linear splicing through a logistic regression algorithm to obtain a result-demand semantic similarity coefficient.
Specifically, the linearly spliced third similarity classification result is processed through a logistic regression algorithm, and a Softmax function weighted value considering the classification weight is used as a result-requirement semantic similarity coefficient.
In one embodiment of the present invention, a Softmax function weighted value considering category weight is used as an achievement-requirement semantic similarity coefficient, and a corresponding formula is as follows:
in the above formula, i is a similarity class (defined as 4 classes, "1" for irrelevant, "2" for weakly relevant, "3" for strongly relevant, "4" for strongly relevant), SiThe Softmax function value for category i is indicated.
S6: and outputting the result-demand semantic identification degree coefficient to a technical result-technical demand semantic matching result according to a descending order.
Specifically, the obtained technical result and the semantic identification degree coefficient of the technical requirement are output to the matching result according to a descending order by taking the technical requirement document as a matching target.
In one embodiment of the invention, the obtained technical requirement document and the semantic recognition coefficient of the technical result document are output as matching results in a descending order by taking the technical result document as a matching target; and taking the technical requirement document as a matching target, and outputting a matching result according to a descending order by using the semantic acquaintance coefficient of the obtained technical result document and the technical requirement document.
Fig. 2 is a structural block diagram of a deep semantic matching system for developing and developing matching of supply and demand description texts according to an embodiment of the present invention. As shown in fig. 2, a deep semantic matching system for matching development supply and demand description texts according to an embodiment of the present invention includes: a pre-processing module 100, an extraction module 200, a training module 300, a splicing module 400, a classification module 500 and a control processing module 600.
The preprocessing module 100 is used for processing titles and contents in technical requirements and technical result texts; an extraction module 200, configured to extract a core content abstract of a technical requirement and a technical result text; the training module 300 is configured to train a text title and a text content abstract of a technical requirement and a technical achievement, respectively, to obtain a first similar category result and a second similar category result; a splicing module 400, configured to splice the first similar category result and the second similar category result linearly to obtain a third similar category result; the classification module 500 is configured to perform logistic regression algorithm processing on the linearly combined third similarity classification result to obtain a result-demand semantic similarity coefficient; and the control processing module 600 is used for controlling text preprocessing, abstract extraction, model training, linear splicing, classification processing and descending order output of titles and contents in long texts for processing technical requirements and technical achievements.
In one embodiment of the present invention, the preprocessing module 100 removes punctuation marks and retains only Chinese characters while preprocessing the titles and contents in the technical requirements and technical achievements text, respectively.
In one embodiment of the present invention, when the extraction module 200 extracts the technical requirement text content and the technical result text content by using the abstract extraction algorithm, the modified textrank algorithm is used to condense the long text content of the technical requirement and the technical result into the text content abstract of no more than 512 characters.
In an embodiment of the present invention, the classification module 500 further includes, when processing the linearly pieced third similarity classification result using the logistic regression algorithm, using a Softmax function weighting value considering the classification weight as the result-requirement semantic similarity coefficient.
It should be noted that, a specific implementation of the deep semantic matching system for matching development and supply and demand description texts in the embodiment of the present invention is similar to a specific implementation of the deep semantic matching method and system for matching development and supply and demand description texts in the embodiment of the present invention, and specific reference is specifically made to a description of the deep semantic matching method and system for matching development and supply and demand description texts, and no further description is made for reducing redundancy.
In addition, other components and functions of the deep semantic matching system for matching the supply and demand description texts are known to those skilled in the art, and are not described in detail in order to reduce redundancy.
An embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is used for running one or more program instructions to execute the deep semantic matching method and system for developing and requiring description text matching according to the first aspect.
The embodiment of the invention discloses a computer-readable storage medium, wherein computer program instructions are stored in the computer-readable storage medium, and when the computer program instructions are run on a computer, the computer is enabled to execute the deep semantic matching method and the deep semantic matching system for matching development and supply and demand description texts.
In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capability. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.
The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.
The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (ddr Data Rate SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.
Claims (10)
1. A deep semantic matching method and a system for matching development supply and demand description texts are characterized by comprising the following steps:
respectively preprocessing the title and the content in the technical requirement and technical result texts to obtain a text title and a text content;
extracting the text content of the preprocessed technical requirement by using a summary extraction algorithm to obtain a summary of the text content of the technical requirement;
extracting the text content of the preprocessed technical result by using a summary extraction algorithm to obtain a summary of the text content of the technical result;
inputting the text titles of the technical requirements and the technical achievements into a first BERT model to obtain a first similar category result;
inputting the text content abstracts of the technical requirements and the technical achievements into a second BERT model to obtain a second similar classification result;
linearly splicing the first similar classification result and the second similar classification result to obtain a third similar classification result;
processing the third similarity classification result after linear splicing through a logistic regression algorithm to obtain a result-demand semantic similarity coefficient;
and outputting the result-demand semantic identification degree coefficient to a technical result-technical demand semantic matching result according to a descending order.
2. The deep semantic matching method and system for research and development of supply and demand description text matching according to claim 1 are characterized by comprising the following steps:
when the title and the content in the technical requirement text and the technical result text are respectively preprocessed, punctuation marks are removed, and only Chinese characters are reserved.
3. The deep semantic matching method and system for research and development of supply and demand description text matching according to claim 1 are characterized by comprising the following steps:
and when the abstract extraction algorithm is used for abstracting the technical requirement text content and the technical result text content, the improved textrank algorithm is used for concentrating the long text content of the technical requirement and the technical result into a technical requirement text content abstract and a technical result text content abstract which do not exceed 512 characters respectively.
4. The deep semantic matching method and system for research and development of supply and demand description text matching according to claim 1 are characterized by comprising the following steps:
and when the logistic regression algorithm is used for processing the linearly spliced third similar category result, using a Softmax function weighted value considering category weight as a result-demand semantic similarity coefficient.
5. A deep semantic matching system for matching research and development supply and demand description texts is characterized by comprising:
the preprocessing module is used for processing titles and contents in the technical requirements and technical result texts;
the extraction module is used for extracting the core content abstract of the technical requirement and technical result text;
the training module is used for respectively training the text titles and the text content abstracts of the technical requirements and the technical achievements to obtain a first similar category result and a second similar category result;
the splicing module is used for linearly splicing the first similar category result and the second similar category result to obtain a third similar category result;
the classification module is used for carrying out logistic regression algorithm processing on the linearly spliced third similarity classification results to obtain a result-demand semantic similarity coefficient;
and the control processing module is used for controlling and processing text preprocessing, abstract extraction, model training, linear splicing, classification processing and descending output of titles and contents in the long text of the technical requirements and the technical achievements.
6. The deep semantic matching system oriented to research and development supply and demand description text matching as claimed in claim 5, wherein the preprocessing module is used for eliminating punctuation marks and only keeping Chinese characters when preprocessing titles and contents in technical demand and technical result texts respectively.
7. The deep semantic matching system oriented to development supply and demand description text matching according to claim 5, wherein the extraction module utilizes a modified textrank algorithm to concentrate the long text contents of technical demand and technical result into text content abstract with no more than 512 characters when the abstract extraction algorithm is used for abstracting the technical demand text contents and the technical result text contents.
8. The deep semantic matching system oriented to research and development supply and demand description text matching, according to claim 5, wherein the classification module uses a Softmax function weighted value considering class weight as a result-demand semantic similarity coefficient when the logistic regression algorithm is used for processing the linearly combined third similarity classification result.
9. An electronic device, characterized in that the electronic device comprises: at least one processor and at least one memory;
the memory is to store one or more program instructions;
the processor, configured to execute one or more program instructions to perform the method of any of claims 1-4.
10. A computer-readable storage medium having one or more program instructions embodied therein for performing the method of any of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110156093.7A CN112861543A (en) | 2021-02-04 | 2021-02-04 | Deep semantic matching method and system for matching research and development supply and demand description texts |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110156093.7A CN112861543A (en) | 2021-02-04 | 2021-02-04 | Deep semantic matching method and system for matching research and development supply and demand description texts |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112861543A true CN112861543A (en) | 2021-05-28 |
Family
ID=75987945
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110156093.7A Pending CN112861543A (en) | 2021-02-04 | 2021-02-04 | Deep semantic matching method and system for matching research and development supply and demand description texts |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112861543A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113743081A (en) * | 2021-09-03 | 2021-12-03 | 西安邮电大学 | Recommendation method of technical service information |
CN116010593A (en) * | 2021-10-20 | 2023-04-25 | 腾讯科技(深圳)有限公司 | Method, device, computer equipment and storage medium for determining disease emotion information |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019105432A1 (en) * | 2017-11-29 | 2019-06-06 | 腾讯科技(深圳)有限公司 | Text recommendation method and apparatus, and electronic device |
CN111309871A (en) * | 2020-03-26 | 2020-06-19 | 普华讯光(北京)科技有限公司 | Method for matching degree between requirement and output result based on text semantic analysis |
CN111444340A (en) * | 2020-03-10 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Text classification and recommendation method, device, equipment and storage medium |
CN111666402A (en) * | 2020-04-30 | 2020-09-15 | 平安科技(深圳)有限公司 | Text abstract generation method and device, computer equipment and readable storage medium |
CN111858912A (en) * | 2020-07-03 | 2020-10-30 | 黑龙江阳光惠远知识产权运营有限公司 | Abstract generation method based on single long text |
-
2021
- 2021-02-04 CN CN202110156093.7A patent/CN112861543A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019105432A1 (en) * | 2017-11-29 | 2019-06-06 | 腾讯科技(深圳)有限公司 | Text recommendation method and apparatus, and electronic device |
CN111444340A (en) * | 2020-03-10 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Text classification and recommendation method, device, equipment and storage medium |
CN111309871A (en) * | 2020-03-26 | 2020-06-19 | 普华讯光(北京)科技有限公司 | Method for matching degree between requirement and output result based on text semantic analysis |
CN111666402A (en) * | 2020-04-30 | 2020-09-15 | 平安科技(深圳)有限公司 | Text abstract generation method and device, computer equipment and readable storage medium |
CN111858912A (en) * | 2020-07-03 | 2020-10-30 | 黑龙江阳光惠远知识产权运营有限公司 | Abstract generation method based on single long text |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113743081A (en) * | 2021-09-03 | 2021-12-03 | 西安邮电大学 | Recommendation method of technical service information |
CN113743081B (en) * | 2021-09-03 | 2023-08-01 | 西安邮电大学 | Recommendation method of technical service information |
CN116010593A (en) * | 2021-10-20 | 2023-04-25 | 腾讯科技(深圳)有限公司 | Method, device, computer equipment and storage medium for determining disease emotion information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111382580B (en) | Encoder-decoder framework pre-training method for neural machine translation | |
US11544474B2 (en) | Generation of text from structured data | |
US8131536B2 (en) | Extraction-empowered machine translation | |
WO2022062404A1 (en) | Text classification model training method, apparatus, and device and storage medium | |
CN111460820B (en) | Network space security domain named entity recognition method and device based on pre-training model BERT | |
CN113987209A (en) | Natural language processing method and device based on knowledge-guided prefix fine tuning, computing equipment and storage medium | |
CN111832293B (en) | Entity and relation joint extraction method based on head entity prediction | |
CN112861543A (en) | Deep semantic matching method and system for matching research and development supply and demand description texts | |
CN114676255A (en) | Text processing method, device, equipment, storage medium and computer program product | |
CN112380837A (en) | Translation model-based similar sentence matching method, device, equipment and medium | |
CN110688834A (en) | Method and equipment for rewriting intelligent manuscript style based on deep learning model | |
CN112232070A (en) | Natural language processing model construction method, system, electronic device and storage medium | |
CN116304748A (en) | Text similarity calculation method, system, equipment and medium | |
Calvin et al. | Image captioning using convolutional neural networks and recurrent neural network | |
Andriyanov | Combining Text and Image Analysis Methods for Solving Multimodal Classification Problems | |
CN114386425B (en) | Big data system establishing method for processing natural language text content | |
CN112287641B (en) | Synonym sentence generating method, system, terminal and storage medium | |
CN112528653B (en) | Short text entity recognition method and system | |
CN114510569A (en) | Chemical emergency news classification method based on Chinesebert model and attention mechanism | |
CN114692635A (en) | Information analysis method and device based on vocabulary enhancement and electronic equipment | |
Buoy et al. | Joint Khmer word segmentation and part-of-speech tagging using deep learning | |
CN113408267A (en) | Word alignment performance improving method based on pre-training model | |
Sun et al. | Chinese named entity recognition using the improved transformer encoder and the lexicon adapter | |
CN115828930B (en) | Distributed word vector space correction method for dynamic fusion of semantic relations | |
CN112560441B (en) | Method for constructing composition syntax analysis tree by combining bottom-up rules with neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |