CN112861543A - Deep semantic matching method and system for matching research and development supply and demand description texts - Google Patents

Deep semantic matching method and system for matching research and development supply and demand description texts Download PDF

Info

Publication number
CN112861543A
CN112861543A CN202110156093.7A CN202110156093A CN112861543A CN 112861543 A CN112861543 A CN 112861543A CN 202110156093 A CN202110156093 A CN 202110156093A CN 112861543 A CN112861543 A CN 112861543A
Authority
CN
China
Prior art keywords
technical
result
text
matching
demand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110156093.7A
Other languages
Chinese (zh)
Inventor
吴俊�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110156093.7A priority Critical patent/CN112861543A/en
Publication of CN112861543A publication Critical patent/CN112861543A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The embodiment of the invention discloses a deep semantic matching method and a deep semantic matching system for matching research and development supply and demand description texts. The deep semantic matching method comprises the following steps: the technical requirement and technical result long text is concentrated into a text content abstract of no more than 512 characters by using an improved textrank algorithm; constructing a double-independent BERT pre-training language model, and respectively training titles and content summaries of two types of texts aiming at technical requirements and technical achievements; and linearly splicing the trained result, and processing the linearly spliced result by using a Softmax function weighted value considering the category weight as a semantic similarity coefficient. According to the technical scheme provided by the invention, the limitation of character input of the BERT pre-training language model 512 is avoided while the core semantics of the text content are not lost, the semantic information representing the technical requirements and the context of technical results is extracted to the greatest extent, the classification is more refined compared with the traditional coarse-grained semantic matching, and the automatic high-precision matching of the technical requirements and the text of the technical results is realized.

Description

Deep semantic matching method and system for matching research and development supply and demand description texts
Technical Field
The embodiment of the invention relates to the field of natural language processing, in particular to a deep semantic matching method and a deep semantic matching system for matching research and development supply and demand description texts.
Background
As an important carrier for interfacing research and development results with research and development demand, a scientific research and development technical result library and a technical research and development demand library are constructed on a plurality of scientific and technological resource sharing platforms, a large amount of text information describing the technical results and the research and development demand is stored, and the patent application is called as a research and development demand description text. The traditional online scientific and technological consultation service mostly adopts manual matching to complete the matching of research and development supply and demand description texts, and has the defects of high personnel investment, low matching efficiency and low matching success rate. How to realize deep and accurate matching of the semantics of the research and development supply and demand description text and the efficient matching of the power supply and demand parties are important technologies for creating intelligent scientific and technological consultation services and important guarantees for accelerating the market transformation of scientific achievements.
Different from a general text semantic matching scene, the semantic matching for developing the supply and demand description text has the following specificity: (1) the professional field is wide in relation, multiple in terms of nouns and strong in technical performance; (2) the text needing to be matched has a title and content, the title does not exceed 30 words, and the semantics is brief and refined; the contents are hundreds of characters different, and the semantics are complex and divergent.
The existing text semantic similarity matching technology is subject to the evolution from shallow machine learning such as TF-IDF, support vector machine, pagerank and the like to deep neural network learning such as word2vec, transformer and the like which considers context semantics. And performing semantic matching classification on the civil message data set of the local government inquiry platform by using a low-level machine learning algorithm and orienting to the public message text. And (3) combining word2vec word embedding with an AutoLMP model in image recognition by virtue of payment and the like, performing question-answer matching on the Quora data set, wherein an adopted algorithm technology cannot capture and solve the problem of word ambiguity or needs field prior knowledge input, and end-to-end automatic matching cannot be realized. The Neihao Hao et al designs an automatic semantic matching model of legal provisions and judicial interpretation based on a BERT pre-training language model, but does not solve the problem of 512 character number limitation of the input BERT model.
The technical scheme of text similarity semantic matching by word2vec word embedding and a shallow neural network model is adopted, and the algorithm model has the characteristic of shallow context semantic learning, is more suitable for general field texts with clear semantics and single connotation, and is difficult to be suitable for the development and demand description text contexts with complex semantics and numerous special terms. The existing algorithm for realizing deep semantic matching based on the BERT pre-training language model is limited by the BERT model only allowing 512-character input, and cannot carry out deep semantic matching on the long text content which is developed and required to describe and exceeds the 512-character limit.
Disclosure of Invention
The embodiment of the invention aims to provide a deep semantic matching method and a deep semantic matching system for matching research and development supply-demand description texts, which are used for solving the problems that the prior art cannot perform semantic similarity matching on a technical result-research and development demand long text exceeding 512 character limit, cannot respectively train a text title and a text content abstract, and cannot perform fine semantic matching classification.
In order to achieve the above object, the embodiments of the present invention mainly provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a deep semantic matching method and a deep semantic matching system method for matching research and development supply and demand description texts, including: respectively preprocessing the title and the content in the technical requirement and technical result texts to obtain a text title and a text content; extracting the text content of the preprocessed technical requirement by using a summary extraction algorithm to obtain a summary of the text content of the technical requirement; extracting the text content of the preprocessed technical result by using a summary extraction algorithm to obtain a summary of the text content of the technical result; inputting the text titles of the technical requirements and the technical achievements into a first BERT model to obtain a first similar category result; inputting the text content abstracts of the technical requirements and the technical achievements into a second BERT model to obtain a second similar classification result; linearly splicing the first similar classification result and the second similar classification result to obtain a third similar classification result; processing the third similarity classification result after linear splicing through a logistic regression algorithm to obtain a result-demand semantic similarity coefficient; and outputting the result-demand semantic identification degree coefficient to a technical result-technical demand semantic matching result according to a descending order.
Further, still include: when the title and the content in the technical requirement text and the technical result text are respectively preprocessed, punctuation marks are removed, and only Chinese characters are reserved.
Further, still include: and when the abstract extraction algorithm is used for abstracting the technical requirement text content and the technical result text content, the improved textrank algorithm is used for concentrating the long text content of the technical requirement and the technical result into a technical requirement text content abstract and a technical result text content abstract which do not exceed 512 characters respectively.
Further, still include: and when the logistic regression algorithm is used for processing the linearly spliced third similar category result, using a Softmax function weighted value considering category weight as a result-demand semantic similarity coefficient.
In a second aspect, an embodiment of the present invention further provides a deep semantic matching system for matching research and development supply and demand description texts, including: the preprocessing module is used for processing titles and contents in the technical requirements and technical result texts; the extraction module is used for extracting the core content abstract of the technical requirement and technical result text; the training module is used for respectively training the text titles and the text content abstracts of the technical requirements and the technical achievements to obtain a first similar category result and a second similar category result; the splicing module is used for linearly splicing the first similar category result and the second similar category result to obtain a third similar category result; the classification module is used for carrying out logistic regression algorithm processing on the linearly spliced third similarity classification results to obtain a result-demand semantic similarity coefficient; and the control processing module is used for controlling and processing text preprocessing, abstract extraction, model training, linear splicing, classification processing and descending output of titles and contents in the long text of the technical requirements and the technical achievements.
Furthermore, when the preprocessing module respectively preprocesses the title and the content in the technical requirement and the technical result text, punctuation marks are removed, and only Chinese characters are reserved.
Further, when the extraction module abstracts the technical requirement text content and the technical result text content by using the abstraction algorithm, the modified textrank algorithm is utilized to concentrate the technical requirement and technical result long text content into a text content abstract of no more than 512 characters.
Further, the classification module uses a Softmax function weighted value considering the class weight as an achievement-requirement semantic similarity coefficient when the logistic regression algorithm is used for processing the linearly pieced third similar classification result.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is used for executing one or more program instructions to execute the deep semantic matching method and system for developing and requiring description text matching according to the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium containing one or more program instructions, where the one or more program instructions are configured to be executed to implement the deep semantic matching method and system for developing and providing matching of description and supply text according to the first aspect.
The technical scheme provided by the embodiment of the invention at least has the following advantages:
the technical scheme provided by the embodiment of the invention concentrates the technical requirements and technical result long text into the text content abstract of no more than 512 characters by using the improved textrank algorithm, avoids the limitation of 512 character input of a BERT pre-training language model while not losing the core semantics of the text content; the method has the advantages that a double BERT pre-training language model is built and is trained respectively aiming at titles and content abstracts of two types of texts of technical requirements and technical achievements, semantic information representing the contexts of the technical requirements and the technical achievements is extracted to the maximum extent, and semantic matching accuracy of the two types of texts is improved remarkably; the traditional Softmax function classification output is improved into Softmax function weighted value output considering the class weight, model class prediction errors are corrected, and text semantic matching accuracy is improved. Compared with a semantic matching method based on word2vec word embedding, a text title semantic matching method based on BERT word embedding and a text title and content combined semantic matching method based on BERT word embedding, the method provided by the invention has the advantages that the value of the model F1 realized by the technical scheme is highest, and the method is obviously improved compared with three types of reference models.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, steps and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and the present invention is not limited to the conditions for implementing the present invention, so that the present invention has no technical significance, and any modification of the structures, changes of the step relation or adjustment should fall within the scope of the technical contents disclosed in the present invention without affecting the efficacy and the purpose which can be achieved by the present invention.
Fig. 1 is a flowchart of a deep semantic matching method and system for developing matching of supply and demand description texts according to an embodiment of the present invention.
Fig. 2 is a structural block diagram of a deep semantic matching system for developing and developing matching of supply and demand description texts according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a technical requirement-technical result text deep semantic matching algorithm provided by an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system algorithms, models, techniques, etc. in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, algorithms, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In the description of the present invention, it is to be understood that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Fig. 1 is a flowchart of a deep semantic matching method and system for developing matching of supply and demand description texts according to an embodiment of the present invention. As shown in fig. 1, a deep semantic matching method and system for matching development supply and demand description texts provided by the embodiment of the present invention includes:
s1: and respectively preprocessing the title and the content in the technical requirement text and the technical result text to obtain a text title and a text content.
Specifically, after the user inputs the text set of the technical requirements and the technical achievements into the deep semantic matching system, the deep semantic matching system preprocesses the text data set, wherein the preprocessing comprises splitting a text title and text contents, removing punctuation marks, only retaining Chinese characters, and obtaining the preprocessed text data set.
In one embodiment of the invention, the pre-processed text data set comprises a pre-processed text title and text content.
S2: extracting the text content of the preprocessed technical requirement by using a summary extraction algorithm to obtain a summary of the text content of the technical requirement; and extracting the text content of the preprocessed technical result by using a summary extraction algorithm to obtain a summary of the text content of the technical result.
Specifically, when core content extraction is carried out on the text content of the technical requirements and the technical achievements in the preprocessed text data set, the long text content of the technical requirements and the technical achievements is concentrated into a text content abstract of no more than 512 characters by using an improved textrank algorithm.
In one embodiment of the present invention, the title sentence P is first calculated in consideration of the sentence node weight adjustment of the similarity characteristic between the text content sentence and the title0With the content sentence PiDegree of similarity ω ofi0The formula is as follows:
Figure BDA0002934749400000061
secondly, traversing the feature words in each sentence, and if the feature words appear in the title, improving the word frequency weight; otherwise, the word frequency weight is kept unchanged. The calculation formula is as follows:
Figure BDA0002934749400000062
question sentence P in the above formula0The feature word vector of is P0=[k01,…,k0h′]TH' is the number of the feature words including the title and the sentence after expansion to obtain an adjustment matrix Dn*h
Then according to formula (1) and matrix Dn*hCalculating the similarity between sentences to obtain a matrix SDn*nAnd the subsequent algorithm iteration process is identical to the textrank algorithm.
In one embodiment of the invention, only the first round of loop is calculated each time the loop is iterated, reducing the calculation time. Consider that the nodes in the text network graph eventually converge to a stable value through iteration, which is the final importance score of each node. The network graph information only exists in the link weight of the edge, the degree is recalculated in each loop only for accelerating the iteration process, and the relationship between the network graph information and the final convergence value is not large. Experiments show that the relation between the extracted abstract and the recalculated degree is not large, so that each iteration only needs to calculate one round of cycle, and the degree of each point is not changed in the iteration. By the optimization, the convergence process of the original o (n ^2) complexity is optimized to the o (n) complexity, and the iterative computation time is greatly reduced.
In one embodiment of the invention, the improved algorithm is applied to the input technical requirement and technical result text content setting parameters (the number of extracted sentences of the technical requirement text content is 12, and the number of extracted sentences of the technical result text content is 10), and the technical requirement and the technical result text content abstract are output.
S3: inputting the text titles of the technical requirements and the technical achievements into a first BERT model to obtain a first similar category result; and inputting the text content abstracts of the technical requirements and the technical achievements into a second BERT model to obtain a second similarity classification result.
Specifically, the preprocessed text titles of the two types of texts with the technical requirements and the technical achievements and the text content summaries of no more than 512 characters are respectively input into two independent BERT pre-training language models for training respectively to obtain a first similar category result and a second similar category result.
In one embodiment of the invention, the text titles of the technical requirements and the technical achievements are input into a BERT pre-training language model for training to obtain a first similar category result; and inputting the text content abstracts of the technical requirements and the technical achievements into another BERT pre-training language model for training to obtain a second similar classification result.
In an embodiment of the invention, aiming at the complex feature of the semantic of the development supply and demand description text, the patent application constructs a dual-independent BERT model architecture, matches technical results-technical requirements titles, matches technical results-technical requirements content abstracts, loads BERT Chinese pre-training vectors (with 768 dimensions of hidden layers, a 12-head mode, total 110M parameters, a maximum sequence length of 512, and a train _ batch _ size of 64) to perform fine-tuning training on training set data, and a text deep semantic matching algorithm schematic diagram is shown in fig. 3.
S4: and linearly splicing the first similar classification result and the second similar classification result to obtain a third similar classification result.
Specifically, the first similar classification result and the second similar classification result are linearly combined according to a certain weight, and the obtained third similar classification result is linearly combined according to a certain weight.
S5: and processing the third similarity classification result after linear splicing through a logistic regression algorithm to obtain a result-demand semantic similarity coefficient.
Specifically, the linearly spliced third similarity classification result is processed through a logistic regression algorithm, and a Softmax function weighted value considering the classification weight is used as a result-requirement semantic similarity coefficient.
In one embodiment of the present invention, a Softmax function weighted value considering category weight is used as an achievement-requirement semantic similarity coefficient, and a corresponding formula is as follows:
Figure BDA0002934749400000081
in the above formula, i is a similarity class (defined as 4 classes, "1" for irrelevant, "2" for weakly relevant, "3" for strongly relevant, "4" for strongly relevant), SiThe Softmax function value for category i is indicated.
S6: and outputting the result-demand semantic identification degree coefficient to a technical result-technical demand semantic matching result according to a descending order.
Specifically, the obtained technical result and the semantic identification degree coefficient of the technical requirement are output to the matching result according to a descending order by taking the technical requirement document as a matching target.
In one embodiment of the invention, the obtained technical requirement document and the semantic recognition coefficient of the technical result document are output as matching results in a descending order by taking the technical result document as a matching target; and taking the technical requirement document as a matching target, and outputting a matching result according to a descending order by using the semantic acquaintance coefficient of the obtained technical result document and the technical requirement document.
Fig. 2 is a structural block diagram of a deep semantic matching system for developing and developing matching of supply and demand description texts according to an embodiment of the present invention. As shown in fig. 2, a deep semantic matching system for matching development supply and demand description texts according to an embodiment of the present invention includes: a pre-processing module 100, an extraction module 200, a training module 300, a splicing module 400, a classification module 500 and a control processing module 600.
The preprocessing module 100 is used for processing titles and contents in technical requirements and technical result texts; an extraction module 200, configured to extract a core content abstract of a technical requirement and a technical result text; the training module 300 is configured to train a text title and a text content abstract of a technical requirement and a technical achievement, respectively, to obtain a first similar category result and a second similar category result; a splicing module 400, configured to splice the first similar category result and the second similar category result linearly to obtain a third similar category result; the classification module 500 is configured to perform logistic regression algorithm processing on the linearly combined third similarity classification result to obtain a result-demand semantic similarity coefficient; and the control processing module 600 is used for controlling text preprocessing, abstract extraction, model training, linear splicing, classification processing and descending order output of titles and contents in long texts for processing technical requirements and technical achievements.
In one embodiment of the present invention, the preprocessing module 100 removes punctuation marks and retains only Chinese characters while preprocessing the titles and contents in the technical requirements and technical achievements text, respectively.
In one embodiment of the present invention, when the extraction module 200 extracts the technical requirement text content and the technical result text content by using the abstract extraction algorithm, the modified textrank algorithm is used to condense the long text content of the technical requirement and the technical result into the text content abstract of no more than 512 characters.
In an embodiment of the present invention, the classification module 500 further includes, when processing the linearly pieced third similarity classification result using the logistic regression algorithm, using a Softmax function weighting value considering the classification weight as the result-requirement semantic similarity coefficient.
It should be noted that, a specific implementation of the deep semantic matching system for matching development and supply and demand description texts in the embodiment of the present invention is similar to a specific implementation of the deep semantic matching method and system for matching development and supply and demand description texts in the embodiment of the present invention, and specific reference is specifically made to a description of the deep semantic matching method and system for matching development and supply and demand description texts, and no further description is made for reducing redundancy.
In addition, other components and functions of the deep semantic matching system for matching the supply and demand description texts are known to those skilled in the art, and are not described in detail in order to reduce redundancy.
An embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is used for running one or more program instructions to execute the deep semantic matching method and system for developing and requiring description text matching according to the first aspect.
The embodiment of the invention discloses a computer-readable storage medium, wherein computer program instructions are stored in the computer-readable storage medium, and when the computer program instructions are run on a computer, the computer is enabled to execute the deep semantic matching method and the deep semantic matching system for matching development and supply and demand description texts.
In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capability. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.
The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.
The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (ddr Data Rate SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (10)

1. A deep semantic matching method and a system for matching development supply and demand description texts are characterized by comprising the following steps:
respectively preprocessing the title and the content in the technical requirement and technical result texts to obtain a text title and a text content;
extracting the text content of the preprocessed technical requirement by using a summary extraction algorithm to obtain a summary of the text content of the technical requirement;
extracting the text content of the preprocessed technical result by using a summary extraction algorithm to obtain a summary of the text content of the technical result;
inputting the text titles of the technical requirements and the technical achievements into a first BERT model to obtain a first similar category result;
inputting the text content abstracts of the technical requirements and the technical achievements into a second BERT model to obtain a second similar classification result;
linearly splicing the first similar classification result and the second similar classification result to obtain a third similar classification result;
processing the third similarity classification result after linear splicing through a logistic regression algorithm to obtain a result-demand semantic similarity coefficient;
and outputting the result-demand semantic identification degree coefficient to a technical result-technical demand semantic matching result according to a descending order.
2. The deep semantic matching method and system for research and development of supply and demand description text matching according to claim 1 are characterized by comprising the following steps:
when the title and the content in the technical requirement text and the technical result text are respectively preprocessed, punctuation marks are removed, and only Chinese characters are reserved.
3. The deep semantic matching method and system for research and development of supply and demand description text matching according to claim 1 are characterized by comprising the following steps:
and when the abstract extraction algorithm is used for abstracting the technical requirement text content and the technical result text content, the improved textrank algorithm is used for concentrating the long text content of the technical requirement and the technical result into a technical requirement text content abstract and a technical result text content abstract which do not exceed 512 characters respectively.
4. The deep semantic matching method and system for research and development of supply and demand description text matching according to claim 1 are characterized by comprising the following steps:
and when the logistic regression algorithm is used for processing the linearly spliced third similar category result, using a Softmax function weighted value considering category weight as a result-demand semantic similarity coefficient.
5. A deep semantic matching system for matching research and development supply and demand description texts is characterized by comprising:
the preprocessing module is used for processing titles and contents in the technical requirements and technical result texts;
the extraction module is used for extracting the core content abstract of the technical requirement and technical result text;
the training module is used for respectively training the text titles and the text content abstracts of the technical requirements and the technical achievements to obtain a first similar category result and a second similar category result;
the splicing module is used for linearly splicing the first similar category result and the second similar category result to obtain a third similar category result;
the classification module is used for carrying out logistic regression algorithm processing on the linearly spliced third similarity classification results to obtain a result-demand semantic similarity coefficient;
and the control processing module is used for controlling and processing text preprocessing, abstract extraction, model training, linear splicing, classification processing and descending output of titles and contents in the long text of the technical requirements and the technical achievements.
6. The deep semantic matching system oriented to research and development supply and demand description text matching as claimed in claim 5, wherein the preprocessing module is used for eliminating punctuation marks and only keeping Chinese characters when preprocessing titles and contents in technical demand and technical result texts respectively.
7. The deep semantic matching system oriented to development supply and demand description text matching according to claim 5, wherein the extraction module utilizes a modified textrank algorithm to concentrate the long text contents of technical demand and technical result into text content abstract with no more than 512 characters when the abstract extraction algorithm is used for abstracting the technical demand text contents and the technical result text contents.
8. The deep semantic matching system oriented to research and development supply and demand description text matching, according to claim 5, wherein the classification module uses a Softmax function weighted value considering class weight as a result-demand semantic similarity coefficient when the logistic regression algorithm is used for processing the linearly combined third similarity classification result.
9. An electronic device, characterized in that the electronic device comprises: at least one processor and at least one memory;
the memory is to store one or more program instructions;
the processor, configured to execute one or more program instructions to perform the method of any of claims 1-4.
10. A computer-readable storage medium having one or more program instructions embodied therein for performing the method of any of claims 1-4.
CN202110156093.7A 2021-02-04 2021-02-04 Deep semantic matching method and system for matching research and development supply and demand description texts Pending CN112861543A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110156093.7A CN112861543A (en) 2021-02-04 2021-02-04 Deep semantic matching method and system for matching research and development supply and demand description texts

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110156093.7A CN112861543A (en) 2021-02-04 2021-02-04 Deep semantic matching method and system for matching research and development supply and demand description texts

Publications (1)

Publication Number Publication Date
CN112861543A true CN112861543A (en) 2021-05-28

Family

ID=75987945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110156093.7A Pending CN112861543A (en) 2021-02-04 2021-02-04 Deep semantic matching method and system for matching research and development supply and demand description texts

Country Status (1)

Country Link
CN (1) CN112861543A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743081A (en) * 2021-09-03 2021-12-03 西安邮电大学 Recommendation method of technical service information
CN116010593A (en) * 2021-10-20 2023-04-25 腾讯科技(深圳)有限公司 Method, device, computer equipment and storage medium for determining disease emotion information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019105432A1 (en) * 2017-11-29 2019-06-06 腾讯科技(深圳)有限公司 Text recommendation method and apparatus, and electronic device
CN111309871A (en) * 2020-03-26 2020-06-19 普华讯光(北京)科技有限公司 Method for matching degree between requirement and output result based on text semantic analysis
CN111444340A (en) * 2020-03-10 2020-07-24 腾讯科技(深圳)有限公司 Text classification and recommendation method, device, equipment and storage medium
CN111666402A (en) * 2020-04-30 2020-09-15 平安科技(深圳)有限公司 Text abstract generation method and device, computer equipment and readable storage medium
CN111858912A (en) * 2020-07-03 2020-10-30 黑龙江阳光惠远知识产权运营有限公司 Abstract generation method based on single long text

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019105432A1 (en) * 2017-11-29 2019-06-06 腾讯科技(深圳)有限公司 Text recommendation method and apparatus, and electronic device
CN111444340A (en) * 2020-03-10 2020-07-24 腾讯科技(深圳)有限公司 Text classification and recommendation method, device, equipment and storage medium
CN111309871A (en) * 2020-03-26 2020-06-19 普华讯光(北京)科技有限公司 Method for matching degree between requirement and output result based on text semantic analysis
CN111666402A (en) * 2020-04-30 2020-09-15 平安科技(深圳)有限公司 Text abstract generation method and device, computer equipment and readable storage medium
CN111858912A (en) * 2020-07-03 2020-10-30 黑龙江阳光惠远知识产权运营有限公司 Abstract generation method based on single long text

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743081A (en) * 2021-09-03 2021-12-03 西安邮电大学 Recommendation method of technical service information
CN113743081B (en) * 2021-09-03 2023-08-01 西安邮电大学 Recommendation method of technical service information
CN116010593A (en) * 2021-10-20 2023-04-25 腾讯科技(深圳)有限公司 Method, device, computer equipment and storage medium for determining disease emotion information

Similar Documents

Publication Publication Date Title
CN111382580B (en) Encoder-decoder framework pre-training method for neural machine translation
US11544474B2 (en) Generation of text from structured data
US8131536B2 (en) Extraction-empowered machine translation
WO2022062404A1 (en) Text classification model training method, apparatus, and device and storage medium
CN111460820B (en) Network space security domain named entity recognition method and device based on pre-training model BERT
CN113987209A (en) Natural language processing method and device based on knowledge-guided prefix fine tuning, computing equipment and storage medium
CN111832293B (en) Entity and relation joint extraction method based on head entity prediction
CN112861543A (en) Deep semantic matching method and system for matching research and development supply and demand description texts
CN114676255A (en) Text processing method, device, equipment, storage medium and computer program product
CN112380837A (en) Translation model-based similar sentence matching method, device, equipment and medium
CN110688834A (en) Method and equipment for rewriting intelligent manuscript style based on deep learning model
CN112232070A (en) Natural language processing model construction method, system, electronic device and storage medium
CN116304748A (en) Text similarity calculation method, system, equipment and medium
Calvin et al. Image captioning using convolutional neural networks and recurrent neural network
Andriyanov Combining Text and Image Analysis Methods for Solving Multimodal Classification Problems
CN114386425B (en) Big data system establishing method for processing natural language text content
CN112287641B (en) Synonym sentence generating method, system, terminal and storage medium
CN112528653B (en) Short text entity recognition method and system
CN114510569A (en) Chemical emergency news classification method based on Chinesebert model and attention mechanism
CN114692635A (en) Information analysis method and device based on vocabulary enhancement and electronic equipment
Buoy et al. Joint Khmer word segmentation and part-of-speech tagging using deep learning
CN113408267A (en) Word alignment performance improving method based on pre-training model
Sun et al. Chinese named entity recognition using the improved transformer encoder and the lexicon adapter
CN115828930B (en) Distributed word vector space correction method for dynamic fusion of semantic relations
CN112560441B (en) Method for constructing composition syntax analysis tree by combining bottom-up rules with neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination