CN110489751A - Text similarity computing method and device, storage medium, electronic equipment - Google Patents

Text similarity computing method and device, storage medium, electronic equipment Download PDF

Info

Publication number
CN110489751A
CN110489751A CN201910746144.4A CN201910746144A CN110489751A CN 110489751 A CN110489751 A CN 110489751A CN 201910746144 A CN201910746144 A CN 201910746144A CN 110489751 A CN110489751 A CN 110489751A
Authority
CN
China
Prior art keywords
text
vector
knowledge mapping
mapping data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910746144.4A
Other languages
Chinese (zh)
Inventor
刘文强
程序
谢思发
张涵宇
江小琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910746144.4A priority Critical patent/CN110489751A/en
Publication of CN110489751A publication Critical patent/CN110489751A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The disclosure provides a kind of Text similarity computing method and device, electronic equipment, storage medium;It is related to field of computer technology.The Text similarity computing method includes: to obtain target text and the corresponding knowledge mapping data of the target text, and carry out conversion process to the knowledge mapping data with the corresponding relationship characteristic vector of the determination knowledge mapping data;Word segmentation processing is carried out to the target text, determines the corresponding original statement sequence of the target text;Corresponding first text vector of the original statement sequence is determined by the relationship characteristic vector and pre-established attention model;The second text vector of pre-set text is obtained, and first text vector and second text vector are calculated with the similarity of the determination target text and the pre-set text according to preset algorithm.The disclosure can be improved the expression effect of content of text, and then promote the accuracy for calculating text similarity.

Description

Text similarity computing method and device, storage medium, electronic equipment
Technical field
This disclosure relates to which field of computer technology, similar in particular to a kind of Text similarity computing method, text Spend computing device, electronic equipment and computer readable storage medium.
Background technique
In game content field, the similarity calculation of article is a most basic demand, in each mould of contents processing Block requires to calculate the similarity of article, such as in the commending contents of article, it is necessary to which the similarity of article is slightly sorted.
However, in the existing scheme for calculating article similarity, such as pass through the text based on Word2vec or Doc2vec Chapter similarity based method, since article can have synonym or polysemant, such as in the article of certain game, different use Family often uses words such as " master craftsman of the Spring and Autumn periods No. seven ", " small short-leg " and " small master craftsman of the Spring and Autumn period " to indicate game role " master craftsman of the Spring and Autumn period No. seven ", will lead to The accuracy for calculating article similarity is lower, influences the usage experience of user.
It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
Be designed to provide a kind of Text similarity computing method, Text similarity computing device, the electronics of the disclosure are set Standby and computer readable storage medium, and then overcome the limitation and defect due to the relevant technologies to a certain extent and cause , the problem of.
According to the disclosure in a first aspect, providing a kind of Text similarity computing method, comprising:
Obtain target text and the corresponding knowledge mapping data of the target text, and to the knowledge mapping data into Row conversion process is with the corresponding relationship characteristic vector of the determination knowledge mapping data;
Word segmentation processing is carried out to the target text, determines the corresponding original statement sequence of the target text;
Determine that the original statement sequence is corresponding by the relationship characteristic vector and pre-established attention model First text vector;
The second text vector of pre-set text is obtained, and according to preset algorithm to first text vector and described the Two text vectors are calculated with the similarity of the determination target text and the pre-set text.
In a kind of exemplary embodiment of the disclosure, the attention model includes encoder and decoder;
It is described that the original statement sequence pair is determined by the relationship characteristic vector and pre-established attention model The first text vector answered includes:
Based on the relationship characteristic vector, by the gating cycle unit in the encoder by the original statement sequence It carries out coding and generates intermediate vector;
First text vector is determined by the intermediate vector and preset attention mechanism.
In a kind of exemplary embodiment of the disclosure, determined by the intermediate vector and preset attention mechanism First text vector includes:
The historical information generated in the encoder is obtained, and passes through the historical information, the intermediate vector and institute It states attention mechanism and determines object statement sequence;
Using the difference of the object statement sequence and the original statement sequence as loss function, and by under gradient Drop method calculates with determination first text vector loss function.
In a kind of exemplary embodiment of the disclosure, conversion process is carried out described in determination to the knowledge mapping data The corresponding relationship characteristic vector of knowledge mapping data includes:
Obtain the relationship array in the knowledge mapping data;Wherein, the relationship array include subject vector, predicate to Amount and object vector;
Translation model based on pre-training, it is true according to the subject vector, the predicate vector and the object vector Determine the corresponding relationship characteristic vector of the knowledge mapping data.
In a kind of exemplary embodiment of the disclosure, target text and the corresponding knowledge of the target text are being obtained Before spectrum data, the method also includes:
The corresponding ontology model of the target text is constructed, and the target is obtained in target location by reptile instrument The corresponding critical data of text;
The critical data is inserted into the ontology model to generate the corresponding knowledge mapping number of the target text Target database is saved according to and by the knowledge mapping data.
In a kind of exemplary embodiment of the disclosure, the critical data is being inserted into the ontology model to generate It states after the knowledge mapping data are saved in target database by the corresponding knowledge mapping data of target text, the side Method further include:
The corresponding new key of the target text is periodically acquired in the target location by the reptile instrument Data;
The knowledge mapping data in the target database are updated according to the new critical data.
In a kind of exemplary embodiment of the disclosure, according to preset algorithm to first text vector and described Two text vectors are calculated includes: with the determination target text and the similarity of the pre-set text
First text vector and second text vector are calculated to determine by cosine similarity State the similarity of target text Yu the pre-set text.
According to the second aspect of the disclosure, a kind of Text similarity computing device is provided, comprising:
Relationship characteristic vector determination unit, for obtaining target text and the corresponding knowledge mapping number of the target text According to, and conversion process is carried out with the corresponding relationship characteristic vector of the determination knowledge mapping data to the knowledge mapping data;
Statement sequence determination unit determines that the target text is corresponding for carrying out word segmentation processing to the target text Original statement sequence;
Text vector determination unit, for determining institute by the relationship characteristic vector and pre-established attention model State corresponding first text vector of original statement sequence;
Similarity calculated, for obtaining the second text vector of pre-set text, and according to preset algorithm to described One text vector and second text vector are calculated similar to the pre-set text with the determination target text Degree.
In a kind of exemplary embodiment of the disclosure, the attention model includes encoder and decoder;It is described Similarity calculated further include:
Intermediate vector generation unit passes through the gating cycle in the encoder for being based on the relationship characteristic vector The original statement sequence is carried out coding and generates intermediate vector by unit;
First text vector determination unit, described in being determined by the intermediate vector and preset attention mechanism First text vector.
In a kind of exemplary embodiment of the disclosure, the first text vector determination unit includes:
The historical information generated in the encoder is obtained, and passes through the historical information, the intermediate vector and institute It states attention mechanism and determines object statement sequence;
Using the difference of the object statement sequence and the original statement sequence as loss function, and by under gradient Drop method calculates with determination first text vector loss function.
In a kind of exemplary embodiment of the disclosure, the relationship characteristic vector determination unit includes:
Obtain the relationship array in the knowledge mapping data;Wherein, the relationship array include subject vector, predicate to Amount and object vector;
Translation model based on pre-training, it is true according to the subject vector, the predicate vector and the object vector Determine the corresponding relationship characteristic vector of the knowledge mapping data.
In a kind of exemplary embodiment of the disclosure, the Text similarity computing device further includes knowledge mapping data Generation unit, the knowledge mapping data generating unit include:
The corresponding ontology model of the target text is constructed, and the target is obtained in target location by reptile instrument The corresponding critical data of text;
The critical data is inserted into the ontology model to generate the corresponding knowledge mapping number of the target text Target database is saved according to and by the knowledge mapping data.
In a kind of exemplary embodiment of the disclosure, the Text similarity computing device further includes knowledge mapping data Updating unit, the knowledge mapping data updating unit include:
The corresponding new key of the target text is periodically acquired in the target location by the reptile instrument Data;
The knowledge mapping data in the target database are updated according to the new critical data.
In a kind of exemplary embodiment of the disclosure, the similarity calculated includes:
First text vector and second text vector are calculated to determine by cosine similarity State the similarity of target text Yu the pre-set text.
According to the third aspect of the disclosure, a kind of electronic equipment is provided, comprising: processor;And memory, for storing The executable instruction of the processor;Wherein, the processor is configured to above-mentioned to execute via the executable instruction is executed Method described in any one.
According to the fourth aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with, The computer program realizes method described in above-mentioned any one when being executed by processor.
Disclosure exemplary embodiment can have it is following partly or entirely the utility model has the advantages that
In the Text similarity computing method provided by an example embodiment of the disclosure, target text and right is obtained The knowledge mapping data answered, and determine the corresponding relationship characteristic vector of knowledge mapping data;Then determine that target text is corresponding Original statement sequence, and original statement sequence corresponding is determined by relationship characteristic vector and pre-established attention model One text vector;The second text vector of pre-set text is obtained, and the first text vector and second is calculated according to preset algorithm Text vector is to determine the similarity of target text and pre-set text.On the one hand, by knowledge mapping data presentation-entity or Person's concept and entity or the corresponding incidence relation of concept, and pass through the calculating that knowledge mapping data assist text similarity, The separating capacity to different terms can effectively be enhanced, improve the accuracy of Text similarity computing;On the other hand, by knowledge graph Modal data can effectively extract the emphasis word in text as auxiliary data, and by attention model, thus effectively The expression effect of content of text is improved, the accuracy for calculating text similarity is further promoted, promotes the usage experience of user.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 is shown can be using a kind of exemplary system of Text similarity computing method and device of the embodiment of the present disclosure The schematic diagram of system framework;
Fig. 2 shows the structural schematic diagrams of the computer system of the electronic equipment suitable for being used to realize the embodiment of the present disclosure;
Fig. 3, which is diagrammatically illustrated, calculates text phase by term frequency-inverse document frequency according to one embodiment of the disclosure Like the schematic diagram of degree;
Fig. 4 is diagrammatically illustrated to be illustrated according to the process of the Text similarity computing method of one embodiment of the disclosure Figure;
Fig. 5 diagrammatically illustrates the structural schematic diagram of the knowledge mapping data according to one embodiment of the disclosure;
Fig. 6 diagrammatically illustrates the signal of the corresponding algorithm frame of attention mechanism according to one embodiment of the disclosure Figure;
Fig. 7 diagrammatically illustrates the schematic block diagram of the Text similarity computing device according to one embodiment of the disclosure.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will more Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.In the following description, it provides perhaps More details fully understand embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that can It is omitted with technical solution of the disclosure one or more in the specific detail, or others side can be used Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution to avoid a presumptuous guest usurps the role of the host and So that all aspects of this disclosure thicken.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place These functional entitys are realized in reason device device and/or microcontroller device.
Fig. 1 is shown can answer using a kind of the exemplary of Text similarity computing method and device of the embodiment of the present disclosure With the schematic diagram of the system architecture of environment.
As shown in Figure 1, system architecture 100 may include one or more of terminal device 101,102,103, network 104 and server 105.Network 104 between terminal device 101,102,103 and server 105 to provide communication link Medium.Network 104 may include various connection types, such as wired, wireless communication link or fiber optic cables etc..Terminal is set Standby 101,102,103 can be the various electronic equipments with display screen, including but not limited to desktop computer, portable computing Machine, smart phone and tablet computer etc..It should be understood that the number of terminal device, network and server in Fig. 1 is only to show Meaning property.According to needs are realized, any number of terminal device, network and server can have.For example server 105 can be with It is the server cluster etc. of multiple server compositions.
Text similarity computing method provided by the embodiment of the present disclosure is generally executed by server 105, correspondingly, text Similarity calculation device is generally positioned in server 105.But it will be readily appreciated by those skilled in the art that the embodiment of the present disclosure Provided Text similarity computing method can also be executed by terminal device 101,102,103, correspondingly, text similarity meter Calculating device also can be set in terminal device 101,102,103, not do particular determination in the present exemplary embodiment to this.Citing For, in a kind of exemplary embodiment, it can be user and uploaded multiple content of text by terminal device 101,102,103 To server 105, server calculates multiple content of text by Text similarity computing method provided by the embodiment of the present disclosure Similarity, and calculated result is transferred to terminal device 101,102,103 etc..
Fig. 2 shows the structural schematic diagrams of the computer system of the electronic equipment suitable for being used to realize the embodiment of the present disclosure.
It should be noted that Fig. 2 shows the computer system 200 of electronic equipment be only an example, should not be to this public affairs The function and use scope for opening embodiment bring any restrictions.
As shown in Fig. 2, computer system 200 includes central processing unit (CPU) 201, it can be read-only according to being stored in Program in memory (ROM) 202 or be loaded into the program in random access storage device (RAM) 203 from storage section 208 and Execute various movements appropriate and processing.In RAM 203, it is also stored with various programs and data needed for system operatio.CPU 201, ROM 202 and RAM 203 is connected with each other by bus 204.Input/output (I/O) interface 205 is also connected to bus 204。
I/O interface 205 is connected to lower component: the importation 206 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 207 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 208 including hard disk etc.; And the communications portion 209 of the network interface card including LAN card, modem etc..Communications portion 209 via such as because The network of spy's net executes communication process.Driver 210 is also connected to I/O interface 205 as needed.Detachable media 211, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 210, in order to read from thereon Computer program be mounted into storage section 208 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer below with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 209, and/or from detachable media 211 are mounted.When the computer program is executed by central processing unit (CPU) 201, execute in the present processes and device The various functions of limiting.In some embodiments, computer system 200 can also include AI (Artificial Intelligence, artificial intelligence) processor, the AI processor is for handling the calculating operation in relation to machine learning.
It should be noted that computer-readable medium shown in the disclosure can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In open, computer-readable signal media may include in a base band or as the data-signal that carrier wave a part is propagated, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.
Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard The mode of part realizes that described unit also can be set in the processor.Wherein, the title of these units is in certain situation Under do not constitute restriction to the unit itself.
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in electronic equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying electronic equipment. Above-mentioned computer-readable medium carries one or more program, when the electronics is set by one for said one or multiple programs When standby execution, so that method described in electronic equipment realization as the following examples.For example, the electronic equipment can be real Each step now as shown in Figure 4 etc..
The technical solution of the embodiment of the present disclosure is described in detail below:
In a kind of technical solution, by carrying out Text similarity computing based on inverse text frequency (TF-IDF) algorithm of word frequency-. First according to TF-IDF algorithm, the keyword of two articles is found out.Here TF-IDF algorithm is mainly the word frequency for passing through calculating (TF) the anti-document frequency (IDF) that * is calculated.By formula it is recognised that TF-IDF and the number occurred in the document are at just Than being inversely proportional with the number of files comprising the word.Fig. 3 is diagrammatically illustrated according to one embodiment of the disclosure through word frequency- Inverse document frequency calculates the schematic diagram of text similarity, refering to what is shown in Fig. 3, taking out several keywords from every article (such as 20) are merged into a set, calculate every article for the word frequency of the word in the set (in order to avoid article length Difference, Relative Word frequency can be used);Then the respective word frequency vector of two articles is generated;Finally calculate the remaining of two vectors String similarity, the value being calculated mean that more greatly two articles are more similar.
There are also in a kind of technical solution, article similarity calculation is carried out by being based on Doc2vec algorithm.The program is mainly led to The core concept for crossing Word2Vec algorithm, is added paragraph vector or sentence vector during training, the paragraph vector or Person's sentence vector is the equal of the memory unit either theme of the paragraph or sentence of context.By a large amount of corpus into The vector of row every article available after training indicates, and then obtains the similarity of two articles by cosine similarity.
In actual application scenarios, above two scheme can have two large problems when carrying out article similarity calculation: : there is the case where many different words indicate identical meaning, such as " master craftsman of the Spring and Autumn period No. seven " in synonym problems, " small short usually in article Leg " and " small master craftsman of the Spring and Autumn period " indicate game role " master craftsman of the Spring and Autumn period No. seven ";Or polysemant problem: identical word is in different context following tables Show that different meanings, such as " apple " are looked like in article by the inclusion of two layers: fruit and Apple Inc..The problem will lead to biography The effect is relatively poor in practical applications for the article similarity calculating method of Doc2vec and IF-IDF of the system based on word order, that is, counts The accuracy for calculating article similarity is poor.
Inventors have found that additionally introducing some auxiliary informations when solving the problems, such as synonym problems and polysemant as input The accuracy of article similarity calculation can effectively be promoted.And knowledge mapping can accurately portray article as the dictionary of auxiliary Or the description of concept, can effectively enhance the separating capacity of different words, in addition, using based on Encoder (encoder) and The attention mechanism of Decoder (decoder) can effectively extract the heavy duty word of article, to effectively improve the table of article Show effect, and then promotes the accuracy for calculating similarity.
Based on said one or multiple problems, this example embodiment provides a kind of Text similarity computing method.It should Text similarity computing method can be applied to above-mentioned server 105, also can be applied in above-mentioned terminal 101,102,103 One or more is not done particular determination to this in the present exemplary embodiment, is illustrated so that terminal executes this method as an example below. Fig. 4 diagrammatically illustrates the flow diagram of the Text similarity computing method according to one embodiment of the disclosure, with reference to Fig. 4 Shown, text similarity calculating method may comprise steps of S410 to step S440:
Step S410, target text and the corresponding knowledge mapping data of the target text are obtained, and to the knowledge Spectrum data carries out conversion process with the corresponding relationship characteristic vector of the determination knowledge mapping data;
Step S420, word segmentation processing is carried out to the target text, determines the corresponding original statement sequence of the target text Column;
Step S430, the original statement sequence is determined by the relationship characteristic vector and pre-established attention model Arrange corresponding first text vector;
Step S440, the second text vector of pre-set text is obtained, and according to preset algorithm to first text vector And second text vector is calculated with the similarity of the determination target text and the pre-set text.
In the Text similarity computing method provided by this example embodiment, on the one hand, pass through knowledge mapping data Presentation-entity or concept and entity or the corresponding incidence relation of concept, and assist text similar by knowledge mapping data The calculating of degree can effectively enhance the separating capacity to different terms, improve the accuracy of Text similarity computing;Another party Face using knowledge mapping data as auxiliary data, and can effectively extract the emphasis word in text by attention model, To effectively improve the expression effect of content of text, the accuracy for calculating text similarity is further promoted, promotes user's Usage experience.
In the following, the above-mentioned steps for this example embodiment are described in more details.
In step S410, target text and the corresponding knowledge mapping data of the target text are obtained, and to described Knowledge mapping data carry out conversion process with the corresponding relationship characteristic vector of the determination knowledge mapping data.
In an example embodiment of the disclosure, target text can refer to that user chooses from preset multiple texts , need to carry out the text of Text similarity computing.Knowledge mapping data can refer to for describing present in real world The dictionary or model of incidence relation between various entities or concept and entity or concept, such as refering to what is shown in Fig. 5, knowledge Spectrum data can be game knowledge mapping as shown in Figure 5 and only schematically illustrate in figure certainly, this example embodiment pair This does not do particular determination.Wherein, each entity (can refer to the name entity inside game, include the game role in game Name, the distinctive entity such as skin name) or the ID (such as " master craftsman of the Spring and Autumn period No. seven ") of one globally unique determination of concept identify, each Attribute value is used to portray the inherent attribute of entity to (such as " master craftsman of the Spring and Autumn period No. seven " and " technical ability 1 "), and relationship (such as " Has_ Skill ") it is used to connect two entities, portray the association of attribute value pair.It is abundant that knowledge mapping data can merge multiple data sources Data semantic information, and the implicit information that can be obtained in conjunction with reasoning provides service for user.Pass through knowledge mapping data energy Semantic association abundant between entity is enough portrayed, the information of certain specific peripheral game can be not only extended, can also strengthen game The association of the knowledge such as role and race, thus knowledge mapping data can be provided for Text similarity computing more accurately it is auxiliary Supplementary information.Conversion process, which can refer to, converts low dimensional for high-dimensional, isomerization, visual knowledge mapping data Feature vector indicates, such as conversion processing can be encoder trained in advance, be also possible to the translation model trained in advance, when So, it only schematically illustrates herein, this example embodiment is not limited.Relationship characteristic vector can refer to knowledge mapping data The low dimensional feature vector obtained by conversion process, by relationship characteristic vector characterize knowledge mapping data, make terminal or Server is able to use or handles knowledge mapping data.
Specifically, terminal obtains the relationship array in knowledge mapping data;Translation model based on pre-training, according to subject Vector, predicate vector and object vector determine the corresponding relationship characteristic vector of knowledge mapping data.Relationship array can refer to The array of the Attribute Association relationship of characterization entity in knowledge mapping data or concept, such as with continued reference to shown in Fig. 5, relationship Array can be Attribute Association relationship (role identification 501 " role 1 ", 502 " Recom_ of incidence relation in knowledge mapping data Equip ", association content 503 " equipment 5 "), it certainly, only schematically illustrates, this example embodiment should not be caused any herein Particular determination.Relation data may include subject vector, predicate vector and object vector, such as with continued reference to shown in Fig. 5, lead Language vector can be relationship array, and (role identification 501 " role 1 ", incidence relation 502 " Recom_Equip " are associated with content 503 " equipment 5 ") in role identification 501 " role 1 ", predicate vector can be relationship array (role identification 501 " role 1 ", association Relationship 502 " Recom_Equip ", association content 503 " equipment 5 ") in incidence relation 502 " Recom_Equip ", object vector It can be relationship array (role identification 501 " role 1 ", incidence relation 502 " Recom_Equip ", the association " equipment of content 503 5 ") the association content 503 " equipment 5 " in, certainly, only schematically illustrates, this example embodiment is not limited herein.First Extract the relationship array in knowledge mapping data, then by subject vector, predicate vector and the object in relationship array to Amount, which is converted to vector, to be indicated to obtain relationship characteristic vector, calculation amount when can not only reduce conversion knowledge spectrum data, improves system The performance of system, and the relationship characteristic vector generated includes subject, predicate with the sentence structure of object, can be further ensured that meter Calculate accuracy when text similarity.
Optionally, it before carrying out conversion process to knowledge mapping data, needs to the corresponding knowledge mapping of target text Data are constructed, it is assumed for example that target text is text relevant to field of play, then constructs game knowledge mapping data;It is false If target text is text relevant to animals and plants field, then animals and plants knowledge mapping data are constructed, this example embodiment is to this Do not do particular determination.Terminal constructs the corresponding ontology model of target text, and obtains mesh in target location by reptile instrument Mark the corresponding critical data of text;By critical data filling ontology model to generate the corresponding knowledge mapping data of target text simultaneously Knowledge mapping data are saved in target database.Ontology model (Ontology), which can refer to, carries out objective things in the world The model of System describe, such as with continued reference to shown in Fig. 5, ontology model can be field of play shown in fig. 5 model or Frame, carrying out fills to ontology model can be obtained the knowledge mapping data of corresponding different role name, such as can be to this Body Model fills " role 1 " relevant content, obtains " role 1 " corresponding knowledge mapping data, can also fill out to ontology model " role 2 " relevant content is filled, " role 2 " corresponding knowledge mapping data are obtained, this example embodiment does not do special limit to this It is fixed.Reptile instrument (web crawler) can refer to according to certain rules, automatically grab the program or script of information, Such as reptile instrument can be Pytnon crawler, be also possible to JSON-handle, certain reptile instrument can also be User- Agent Switcher, this example embodiment do not do particular determination to this.Target position can be the content referred in target text Corresponding web page address.Critical data can refer to the corresponding keyword of the attribute information for needing to insert in ontology model, such as With continued reference to shown in Fig. 5, for " role 1 " in game knowledge mapping data, the game role in the corresponding official website of the game It introduces in corresponding webpage, " role 1 " corresponding skill designations, skin title, heroic yoke relationship is obtained by reptile instrument Etc. critical datas, and critical data is filled into ontology model, obtains game knowledge mapping data as shown in Figure 5, certainly It only schematically illustrates herein, this example embodiment is not limited.After obtaining knowledge mapping data, by knowledge mapping number It is saved according to corresponding database is saved in, knowledge mapping data is updated and is used in order to subsequent.Pass through building Ontology model simultaneously fills ontology model by reptile instrument, and the knowledge mapping data made are more complete, accurate, know in promotion The formation efficiency of knowledge mapping data, the working efficiency of lifting system are improved while knowing the accuracy of spectrum data.
Optionally, terminal periodically acquires the corresponding new crucial number of target text in target location by reptile instrument According to;The knowledge mapping data in target database are updated according to new critical data.New critical data can refer to knowledge graph The updated data of the corresponding critical data of modal data periodically obtain new crucial number by reptile instrument in target position According to, and knowledge mapping data are updated, can timely update corresponding knowledge mapping data when critical data updates, and protect The accuracy for demonstrate,proving knowledge mapping data further increases the accuracy for calculating text similarity, promotes the usage experience of user.
In the step s 420, word segmentation processing is carried out to the target text, determines the corresponding original language of the target text Sentence sequence.
In an example embodiment of the disclosure, word segmentation processing, which can refer to, is divided into word or language for target text The treatment process that sentence indicates, such as word segmentation processing can refer to through subordinate sentence model Doc2vec or participle model Word2Vec To the process that target text is handled, certainly, this example embodiment does not do particular determination to this.Original statement sequence can be Refer to and target text is subjected to the statement sequence obtained after word segmentation processing, original statement sequence is for object statement sequence.
In step S430, the original language is determined by the relationship characteristic vector and pre-established attention model Corresponding first text vector of sentence sequence.
In an example embodiment of the disclosure, attention model can refer to based on attention mechanism (Attention Mechanism) the model constructed can provide the more attentions of regional inputs that needs are paid close attention to by attention model Source to obtain the detailed information of more required concern targets, and inhibits other garbages.First text vector can be Refer to that the vector for generating the corresponding statement sequence of target text by attention model, the first text vector can characterize target text This core content.
Specifically, attention model may include encoder and decoder;Based on relationship characteristic vector, pass through encoder In gating cycle unit by original statement sequence carry out coding generate intermediate vector;Pass through intermediate vector and preset attention Power mechanism determines the first text vector.Encoder can refer to the mould that will be encoded original statement sequence in attention model Block, encoder may include multiple gating cycle units (gated recurrent unit, GRU).Gating cycle unit can be with Refer to one of Recognition with Recurrent Neural Network (RNN) door control mechanism, it is similar to other door control mechanisms, it aims to solve the problem that in standard RNN Gradient disappearance/explosion issues and retain the long-term information of sequence simultaneously.Intermediate vector can refer to that attention model processing is former Corresponding intermediate state vector when beginning statement sequence.It can make corresponding first text of the target text generated by attention model This vector include contextual information, and can make the first text vector include target text core content, can guarantee into Accuracy when row Text similarity computing.
Further, the historical information generated in encoder is obtained, and passes through historical information, intermediate vector and attention Mechanism determines object statement sequence;Using the difference of object statement sequence and original statement sequence as loss function, and pass through Gradient descent method calculates to determine the first text vector loss function.Historical information can be encoder at eve The information generated when managing original statement sequence, by the addition of historical information, it includes upper for capable of making the first text vector generated Context information.Object statement sequence can refer to original statement sequence is handled by main force's model after the sentence sequence that generates Column.Gradient descent method (Gradient descent) can refer to a kind of single order optimization algorithm, and also commonly referred to as steepest declines Method is usually used in approaching minimum deflection model in machine learning and artificial intelligence with being used to recursiveness, to use gradient descent method Find the local minimum of a function, it is necessary to which current point corresponds to the opposite direction of gradient (or approximate gradient) on function Regulation step distance point is iterated search.
Fig. 6 diagrammatically illustrates the signal of the corresponding algorithm frame of attention mechanism according to one embodiment of the disclosure Figure.
Refering to what is shown in Fig. 6, being indicated using Encoder frame 601 and Decoder frame 602 to target text.It is thought Think to be exactly for inputting original statement sequence X=< x1,x2…xn>, using the vector for the word that previous step learns, pass through one A gating cycle unit 603 is first encoded (encoder), is converted into the intermediate vector 604C=with contextual information F(x1,x2…xn), then using another gating cycle unit 603 and attention mechanism 605 according in original statement sequence X Between vector 604 and formerly the historical information that has generated generates the i moment word y to be generatedi, finally obtain Y=< y1, y2…yn>.Then by X, the difference between Y seeks the loss function as loss function, and by gradient descent method Solution, finally will solve obtained final vector C as the first text vector indicates target text.Finally by cosine similarity meter The first text vector and the second text vector are calculated to determine the similarity of target text and pre-set text.
In step S440, the second text vector of pre-set text is obtained, and according to preset algorithm to first text Vector and second text vector are calculated with the similarity of the determination target text and the pre-set text.
In an example embodiment of the disclosure, pre-set text, which can refer to, carries out similarity calculation with target text Text, the second text vector can refer to that the corresponding text vector of pre-set text, such as the second text vector can be default text This text vector obtained by step S410 to step S430 is also possible to the corresponding text of pre-set text that other methods obtain This vector, this example embodiment do not do particular determination to this.Preset algorithm can refer to pre-set for calculating target text The algorithm of this and pre-set text similarity, such as preset algorithm can be cosine similarity algorithm, be also possible to Euclidean distance calculation Method, certainly, preset algorithm can also be that other calculate the algorithm of text similarity, this example embodiment does not do special limit to this It is fixed.
It should be noted that although describing each step of method in the disclosure in the accompanying drawings with particular order, this is simultaneously Undesired or hint must execute these steps in this particular order, or have to carry out the ability of step shown in whole Realize desired result.Additional or alternative, it is convenient to omit multiple steps are merged into a step and executed by certain steps, And/or a step is decomposed into execution of multiple steps etc..
Further, in this example embodiment, a kind of Text similarity computing device is additionally provided.Text similarity Computing device can be applied to a server or terminal device.Refering to what is shown in Fig. 7, text similarity calculation device 700 can be with Including relationship characteristic vector determination unit 710, statement sequence determination unit 720, text vector determination unit 730 and similarity Computing unit 740.Wherein:
Relationship characteristic vector determination unit 710, for obtaining target text and the corresponding knowledge graph of the target text Modal data, and to the knowledge mapping data carry out conversion process with the corresponding relationship characteristic of the determination knowledge mapping data to Amount;
Statement sequence determination unit 720 determines the target text pair for carrying out word segmentation processing to the target text The original statement sequence answered;
Text vector determination unit 730, for true by the relationship characteristic vector and pre-established attention model Determine corresponding first text vector of the original statement sequence;
Similarity calculated 740, for obtaining the second text vector of pre-set text, and according to preset algorithm to described First text vector and second text vector are calculated with the phase of the determination target text and the pre-set text Like degree.
In a kind of exemplary embodiment of the disclosure, the attention model includes encoder and decoder;It is described Similarity calculated 740 further include:
Intermediate vector generation unit passes through the gating cycle in the encoder for being based on the relationship characteristic vector The original statement sequence is carried out coding and generates intermediate vector by unit;
First text vector determination unit, described in being determined by the intermediate vector and preset attention mechanism First text vector.
In a kind of exemplary embodiment of the disclosure, the first text vector determination unit includes:
The historical information generated in the encoder is obtained, and passes through the historical information, the intermediate vector and institute It states attention mechanism and determines object statement sequence;
Using the difference of the object statement sequence and the original statement sequence as loss function, and by under gradient Drop method calculates with determination first text vector loss function.
In a kind of exemplary embodiment of the disclosure, the relationship characteristic vector determination unit 710 includes:
Obtain the relationship array in the knowledge mapping data;Wherein, the relationship array include subject vector, predicate to Amount and object vector;
Translation model based on pre-training, it is true according to the subject vector, the predicate vector and the object vector Determine the corresponding relationship characteristic vector of the knowledge mapping data.
In a kind of exemplary embodiment of the disclosure, the Text similarity computing device 700 further includes knowledge mapping Data generating unit, the knowledge mapping data generating unit include:
The corresponding ontology model of the target text is constructed, and the target is obtained in target location by reptile instrument The corresponding critical data of text;
The critical data is inserted into the ontology model to generate the corresponding knowledge mapping number of the target text Target database is saved according to and by the knowledge mapping data.
In a kind of exemplary embodiment of the disclosure, the Text similarity computing device 700 further includes knowledge mapping Data updating unit, the knowledge mapping data updating unit include:
The corresponding new key of the target text is periodically acquired in the target location by the reptile instrument Data;
The knowledge mapping data in the target database are updated according to the new critical data.
In a kind of exemplary embodiment of the disclosure, the similarity calculated 740 includes:
First text vector and second text vector are calculated to determine by cosine similarity State the similarity of target text Yu the pre-set text.
Each module or the detail of unit are in corresponding text similarity in above-mentioned Text similarity computing device It is described in detail in calculation method, therefore details are not described herein again.
It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claim is pointed out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.

Claims (10)

1. a kind of Text similarity computing method characterized by comprising
Target text and the corresponding knowledge mapping data of the target text are obtained, and the knowledge mapping data are turned Processing is changed with the corresponding relationship characteristic vector of the determination knowledge mapping data;
Word segmentation processing is carried out to the target text, determines the corresponding original statement sequence of the target text;
The original statement sequence corresponding first is determined by the relationship characteristic vector and pre-established attention model Text vector;
The second text vector of pre-set text is obtained, and according to preset algorithm to first text vector and second text This vector is calculated with the similarity of the determination target text and the pre-set text.
2. Text similarity computing method according to claim 1, which is characterized in that the attention model includes coding Device and decoder;
It is described to determine that the original statement sequence is corresponding by the relationship characteristic vector and pre-established attention model First text vector includes:
Based on the relationship characteristic vector, the original statement sequence is carried out by the gating cycle unit in the encoder Coding generates intermediate vector;
First text vector is determined by the intermediate vector and preset attention mechanism.
3. Text similarity computing method according to claim 2, which is characterized in that by the intermediate vector and in advance If attention mechanism determine that first text vector includes:
The historical information generated in the encoder is obtained, and passes through the historical information, the intermediate vector and the note Meaning power mechanism determines object statement sequence;
Using the difference of the object statement sequence and the original statement sequence as loss function, and pass through gradient descent method The loss function is calculated with determination first text vector.
4. Text similarity computing method according to claim 1, which is characterized in that carried out to the knowledge mapping data Conversion process includes: with the corresponding relationship characteristic vector of the determination knowledge mapping data
Obtain the relationship array in the knowledge mapping data;Wherein, the relationship array include subject vector, predicate vector with And object vector;
Translation model based on pre-training determines institute according to the subject vector, the predicate vector and the object vector State the corresponding relationship characteristic vector of knowledge mapping data.
5. Text similarity computing method according to claim 1, which is characterized in that obtaining target text and described Before the corresponding knowledge mapping data of target text, the method also includes:
The corresponding ontology model of the target text is constructed, and the target text is obtained in target location by reptile instrument Corresponding critical data;
The critical data is inserted into the ontology model to generate the corresponding knowledge mapping data of the target text simultaneously The knowledge mapping data are saved in target database.
6. Text similarity computing method according to claim 5, which is characterized in that the critical data is being inserted institute Ontology model is stated to generate the corresponding knowledge mapping data of the target text for the knowledge mapping data and be saved in mesh After marking database, the method also includes:
The corresponding new critical data of the target text is periodically acquired in the target location by the reptile instrument;
The knowledge mapping data in the target database are updated according to the new critical data.
7. Text similarity computing method according to claim 1, which is characterized in that according to preset algorithm to described first Text vector and second text vector are calculated with the similarity of the determination target text and the pre-set text Include:
First text vector and second text vector are calculated with the determination mesh by cosine similarity Mark the similarity of text and the pre-set text.
8. a kind of Text similarity computing device characterized by comprising
Relationship characteristic vector determination unit, for obtaining target text and the corresponding knowledge mapping data of the target text, And conversion process is carried out with the corresponding relationship characteristic vector of the determination knowledge mapping data to the knowledge mapping data;
Statement sequence determination unit determines the corresponding original of the target text for carrying out word segmentation processing to the target text Beginning statement sequence;
Text vector determination unit, for determining the original by the relationship characteristic vector and pre-established attention model Corresponding first text vector of beginning statement sequence;
Similarity calculated, for obtaining the second text vector of pre-set text, and according to preset algorithm to first text This vector and second text vector are calculated with the similarity of the determination target text and the pre-set text.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt Claim 1 to 7 described in any item methods are realized when processor executes.
10. a kind of electronic equipment characterized by comprising
Processor;And
Memory, for storing the executable instruction of the processor;
Wherein, the processor be configured to via execute the executable instruction come perform claim require it is 1 to 7 described in any item Method.
CN201910746144.4A 2019-08-13 2019-08-13 Text similarity computing method and device, storage medium, electronic equipment Pending CN110489751A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910746144.4A CN110489751A (en) 2019-08-13 2019-08-13 Text similarity computing method and device, storage medium, electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910746144.4A CN110489751A (en) 2019-08-13 2019-08-13 Text similarity computing method and device, storage medium, electronic equipment

Publications (1)

Publication Number Publication Date
CN110489751A true CN110489751A (en) 2019-11-22

Family

ID=68550847

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910746144.4A Pending CN110489751A (en) 2019-08-13 2019-08-13 Text similarity computing method and device, storage medium, electronic equipment

Country Status (1)

Country Link
CN (1) CN110489751A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795926A (en) * 2020-01-03 2020-02-14 四川大学 Judgment document similarity judgment method and system based on legal knowledge graph
CN110955831A (en) * 2019-11-25 2020-04-03 北京三快在线科技有限公司 Article recommendation method and device, computer equipment and storage medium
CN111159423A (en) * 2019-12-27 2020-05-15 北京明略软件系统有限公司 Entity association method, device and computer readable storage medium
CN111209395A (en) * 2019-12-27 2020-05-29 铜陵中科汇联科技有限公司 Short text similarity calculation system and training method thereof
CN111275091A (en) * 2020-01-16 2020-06-12 平安科技(深圳)有限公司 Intelligent text conclusion recommendation method and device and computer readable storage medium
CN111325033A (en) * 2020-03-20 2020-06-23 中国建设银行股份有限公司 Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN111539197A (en) * 2020-04-15 2020-08-14 北京百度网讯科技有限公司 Text matching method and device, computer system and readable storage medium
CN111767706A (en) * 2020-06-19 2020-10-13 北京工业大学 Text similarity calculation method and device, electronic equipment and medium
CN111861268A (en) * 2020-07-31 2020-10-30 平安金融管理学院(中国·深圳) Candidate recommending method and device, electronic equipment and storage medium
CN111914568A (en) * 2020-07-31 2020-11-10 平安科技(深圳)有限公司 Method, device and equipment for generating text modifying sentence and readable storage medium
CN112287217A (en) * 2020-10-23 2021-01-29 平安科技(深圳)有限公司 Medical literature retrieval method, device, electronic equipment and storage medium
CN112434167A (en) * 2021-01-26 2021-03-02 支付宝(杭州)信息技术有限公司 Information identification method and device
CN112528600A (en) * 2020-12-15 2021-03-19 北京百度网讯科技有限公司 Text data processing method, related device and computer program product
CN113495954A (en) * 2020-03-20 2021-10-12 北京沃东天骏信息技术有限公司 Text data determination method and device
CN114372150A (en) * 2021-12-10 2022-04-19 天翼物联科技有限公司 Knowledge graph construction method, system, device and storage medium
CN115827877A (en) * 2023-02-07 2023-03-21 湖南正宇软件技术开发有限公司 Proposal auxiliary combination method, device, computer equipment and storage medium
CN111275091B (en) * 2020-01-16 2024-05-10 平安科技(深圳)有限公司 Text conclusion intelligent recommendation method and device and computer readable storage medium

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955831A (en) * 2019-11-25 2020-04-03 北京三快在线科技有限公司 Article recommendation method and device, computer equipment and storage medium
CN110955831B (en) * 2019-11-25 2023-04-14 北京三快在线科技有限公司 Article recommendation method and device, computer equipment and storage medium
CN111159423B (en) * 2019-12-27 2023-04-07 北京明略软件系统有限公司 Entity association method, device and computer readable storage medium
CN111159423A (en) * 2019-12-27 2020-05-15 北京明略软件系统有限公司 Entity association method, device and computer readable storage medium
CN111209395A (en) * 2019-12-27 2020-05-29 铜陵中科汇联科技有限公司 Short text similarity calculation system and training method thereof
CN110795926B (en) * 2020-01-03 2020-04-07 四川大学 Judgment document similarity judgment method and system based on legal knowledge graph
CN110795926A (en) * 2020-01-03 2020-02-14 四川大学 Judgment document similarity judgment method and system based on legal knowledge graph
CN111275091A (en) * 2020-01-16 2020-06-12 平安科技(深圳)有限公司 Intelligent text conclusion recommendation method and device and computer readable storage medium
CN111275091B (en) * 2020-01-16 2024-05-10 平安科技(深圳)有限公司 Text conclusion intelligent recommendation method and device and computer readable storage medium
CN111325033A (en) * 2020-03-20 2020-06-23 中国建设银行股份有限公司 Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN113495954A (en) * 2020-03-20 2021-10-12 北京沃东天骏信息技术有限公司 Text data determination method and device
CN111325033B (en) * 2020-03-20 2023-07-11 中国建设银行股份有限公司 Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN111539197A (en) * 2020-04-15 2020-08-14 北京百度网讯科技有限公司 Text matching method and device, computer system and readable storage medium
CN111539197B (en) * 2020-04-15 2023-08-15 北京百度网讯科技有限公司 Text matching method and device, computer system and readable storage medium
CN111767706A (en) * 2020-06-19 2020-10-13 北京工业大学 Text similarity calculation method and device, electronic equipment and medium
CN111861268A (en) * 2020-07-31 2020-10-30 平安金融管理学院(中国·深圳) Candidate recommending method and device, electronic equipment and storage medium
CN111914568A (en) * 2020-07-31 2020-11-10 平安科技(深圳)有限公司 Method, device and equipment for generating text modifying sentence and readable storage medium
CN111914568B (en) * 2020-07-31 2024-02-06 平安科技(深圳)有限公司 Method, device and equipment for generating text sentence and readable storage medium
CN112287217B (en) * 2020-10-23 2023-08-04 平安科技(深圳)有限公司 Medical document retrieval method, medical document retrieval device, electronic equipment and storage medium
CN112287217A (en) * 2020-10-23 2021-01-29 平安科技(深圳)有限公司 Medical literature retrieval method, device, electronic equipment and storage medium
CN112528600A (en) * 2020-12-15 2021-03-19 北京百度网讯科技有限公司 Text data processing method, related device and computer program product
CN112528600B (en) * 2020-12-15 2024-05-07 北京百度网讯科技有限公司 Text data processing method, related device and computer program product
CN112434167B (en) * 2021-01-26 2021-04-20 支付宝(杭州)信息技术有限公司 Information identification method and device
CN112434167A (en) * 2021-01-26 2021-03-02 支付宝(杭州)信息技术有限公司 Information identification method and device
CN114372150A (en) * 2021-12-10 2022-04-19 天翼物联科技有限公司 Knowledge graph construction method, system, device and storage medium
CN114372150B (en) * 2021-12-10 2024-05-07 天翼物联科技有限公司 Knowledge graph construction method, system, device and storage medium
CN115827877A (en) * 2023-02-07 2023-03-21 湖南正宇软件技术开发有限公司 Proposal auxiliary combination method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110489751A (en) Text similarity computing method and device, storage medium, electronic equipment
CN107735804B (en) System and method for transfer learning techniques for different sets of labels
CN109726396A (en) Semantic matching method, device, medium and the electronic equipment of question and answer text
CN110140133A (en) The implicit bridge joint of machine learning task
US11195098B2 (en) Method for generating neural network and electronic device
CN112560479A (en) Abstract extraction model training method, abstract extraction device and electronic equipment
CN109614111A (en) Method and apparatus for generating code
CN109376234A (en) A kind of method and apparatus of trained summarization generation model
CN112507101B (en) Method and device for establishing pre-training language model
JP7417679B2 (en) Information extraction methods, devices, electronic devices and storage media
US11321370B2 (en) Method for generating question answering robot and computer device
CN109271403A (en) A kind of operating method of data query, device, medium and electronic equipment
JP2022003537A (en) Method and device for recognizing intent of dialog, electronic apparatus, and storage medium
CN109710760A (en) Clustering method, device, medium and the electronic equipment of short text
CN110162766A (en) Term vector update method and device
US20230094730A1 (en) Model training method and method for human-machine interaction
CN116245097A (en) Method for training entity recognition model, entity recognition method and corresponding device
WO2022228127A1 (en) Element text processing method and apparatus, electronic device, and storage medium
US20210312308A1 (en) Method for determining answer of question, computing device and storage medium
CN113961679A (en) Intelligent question and answer processing method and system, electronic equipment and storage medium
CN111079376B (en) Data labeling method, device, medium and electronic equipment
EP3855341A1 (en) Language generation method and apparatus, electronic device and storage medium
US11842290B2 (en) Using functions to annotate a syntax tree with real data used to generate an answer to a question
CN113553411B (en) Query statement generation method and device, electronic equipment and storage medium
CN112966513B (en) Method and apparatus for entity linking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination