CN110489751A

CN110489751A - Text similarity computing method and device, storage medium, electronic equipment

Info

Publication number: CN110489751A
Application number: CN201910746144.4A
Authority: CN
Inventors: 刘文强; 程序; 谢思发; 张涵宇; 江小琴
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2019-11-22

Abstract

The disclosure provides a kind of Text similarity computing method and device, electronic equipment, storage medium；It is related to field of computer technology.The Text similarity computing method includes: to obtain target text and the corresponding knowledge mapping data of the target text, and carry out conversion process to the knowledge mapping data with the corresponding relationship characteristic vector of the determination knowledge mapping data；Word segmentation processing is carried out to the target text, determines the corresponding original statement sequence of the target text；Corresponding first text vector of the original statement sequence is determined by the relationship characteristic vector and pre-established attention model；The second text vector of pre-set text is obtained, and first text vector and second text vector are calculated with the similarity of the determination target text and the pre-set text according to preset algorithm.The disclosure can be improved the expression effect of content of text, and then promote the accuracy for calculating text similarity.

Description

Text similarity computing method and device, storage medium, electronic equipment

Technical field

This disclosure relates to which field of computer technology, similar in particular to a kind of Text similarity computing method, text Spend computing device, electronic equipment and computer readable storage medium.

Background technique

In game content field, the similarity calculation of article is a most basic demand, in each mould of contents processing Block requires to calculate the similarity of article, such as in the commending contents of article, it is necessary to which the similarity of article is slightly sorted.

However, in the existing scheme for calculating article similarity, such as pass through the text based on Word2vec or Doc2vec Chapter similarity based method, since article can have synonym or polysemant, such as in the article of certain game, different use Family often uses words such as " master craftsman of the Spring and Autumn periods No. seven ", " small short-leg " and " small master craftsman of the Spring and Autumn period " to indicate game role " master craftsman of the Spring and Autumn period No. seven ", will lead to The accuracy for calculating article similarity is lower, influences the usage experience of user.

It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.

Summary of the invention

Be designed to provide a kind of Text similarity computing method, Text similarity computing device, the electronics of the disclosure are set Standby and computer readable storage medium, and then overcome the limitation and defect due to the relevant technologies to a certain extent and cause , the problem of.

According to the disclosure in a first aspect, providing a kind of Text similarity computing method, comprising:

Obtain target text and the corresponding knowledge mapping data of the target text, and to the knowledge mapping data into Row conversion process is with the corresponding relationship characteristic vector of the determination knowledge mapping data；

Word segmentation processing is carried out to the target text, determines the corresponding original statement sequence of the target text；

Determine that the original statement sequence is corresponding by the relationship characteristic vector and pre-established attention model First text vector；

The second text vector of pre-set text is obtained, and according to preset algorithm to first text vector and described the Two text vectors are calculated with the similarity of the determination target text and the pre-set text.

In a kind of exemplary embodiment of the disclosure, the attention model includes encoder and decoder；

It is described that the original statement sequence pair is determined by the relationship characteristic vector and pre-established attention model The first text vector answered includes:

Based on the relationship characteristic vector, by the gating cycle unit in the encoder by the original statement sequence It carries out coding and generates intermediate vector；

First text vector is determined by the intermediate vector and preset attention mechanism.

In a kind of exemplary embodiment of the disclosure, determined by the intermediate vector and preset attention mechanism First text vector includes:

The historical information generated in the encoder is obtained, and passes through the historical information, the intermediate vector and institute It states attention mechanism and determines object statement sequence；

Using the difference of the object statement sequence and the original statement sequence as loss function, and by under gradient Drop method calculates with determination first text vector loss function.

In a kind of exemplary embodiment of the disclosure, conversion process is carried out described in determination to the knowledge mapping data The corresponding relationship characteristic vector of knowledge mapping data includes:

Obtain the relationship array in the knowledge mapping data；Wherein, the relationship array include subject vector, predicate to Amount and object vector；

Translation model based on pre-training, it is true according to the subject vector, the predicate vector and the object vector Determine the corresponding relationship characteristic vector of the knowledge mapping data.

In a kind of exemplary embodiment of the disclosure, target text and the corresponding knowledge of the target text are being obtained Before spectrum data, the method also includes:

The corresponding ontology model of the target text is constructed, and the target is obtained in target location by reptile instrument The corresponding critical data of text；

The critical data is inserted into the ontology model to generate the corresponding knowledge mapping number of the target text Target database is saved according to and by the knowledge mapping data.

In a kind of exemplary embodiment of the disclosure, the critical data is being inserted into the ontology model to generate It states after the knowledge mapping data are saved in target database by the corresponding knowledge mapping data of target text, the side Method further include:

The corresponding new key of the target text is periodically acquired in the target location by the reptile instrument Data；

The knowledge mapping data in the target database are updated according to the new critical data.

In a kind of exemplary embodiment of the disclosure, according to preset algorithm to first text vector and described Two text vectors are calculated includes: with the determination target text and the similarity of the pre-set text

First text vector and second text vector are calculated to determine by cosine similarity State the similarity of target text Yu the pre-set text.

According to the second aspect of the disclosure, a kind of Text similarity computing device is provided, comprising:

Relationship characteristic vector determination unit, for obtaining target text and the corresponding knowledge mapping number of the target text According to, and conversion process is carried out with the corresponding relationship characteristic vector of the determination knowledge mapping data to the knowledge mapping data；

Statement sequence determination unit determines that the target text is corresponding for carrying out word segmentation processing to the target text Original statement sequence；

Text vector determination unit, for determining institute by the relationship characteristic vector and pre-established attention model State corresponding first text vector of original statement sequence；

Similarity calculated, for obtaining the second text vector of pre-set text, and according to preset algorithm to described One text vector and second text vector are calculated similar to the pre-set text with the determination target text Degree.

In a kind of exemplary embodiment of the disclosure, the attention model includes encoder and decoder；It is described Similarity calculated further include:

Intermediate vector generation unit passes through the gating cycle in the encoder for being based on the relationship characteristic vector The original statement sequence is carried out coding and generates intermediate vector by unit；

First text vector determination unit, described in being determined by the intermediate vector and preset attention mechanism First text vector.

In a kind of exemplary embodiment of the disclosure, the first text vector determination unit includes:

In a kind of exemplary embodiment of the disclosure, the relationship characteristic vector determination unit includes:

In a kind of exemplary embodiment of the disclosure, the Text similarity computing device further includes knowledge mapping data Generation unit, the knowledge mapping data generating unit include:

In a kind of exemplary embodiment of the disclosure, the Text similarity computing device further includes knowledge mapping data Updating unit, the knowledge mapping data updating unit include:

In a kind of exemplary embodiment of the disclosure, the similarity calculated includes:

According to the third aspect of the disclosure, a kind of electronic equipment is provided, comprising: processor；And memory, for storing The executable instruction of the processor；Wherein, the processor is configured to above-mentioned to execute via the executable instruction is executed Method described in any one.

According to the fourth aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with, The computer program realizes method described in above-mentioned any one when being executed by processor.

Disclosure exemplary embodiment can have it is following partly or entirely the utility model has the advantages that

In the Text similarity computing method provided by an example embodiment of the disclosure, target text and right is obtained The knowledge mapping data answered, and determine the corresponding relationship characteristic vector of knowledge mapping data；Then determine that target text is corresponding Original statement sequence, and original statement sequence corresponding is determined by relationship characteristic vector and pre-established attention model One text vector；The second text vector of pre-set text is obtained, and the first text vector and second is calculated according to preset algorithm Text vector is to determine the similarity of target text and pre-set text.On the one hand, by knowledge mapping data presentation-entity or Person's concept and entity or the corresponding incidence relation of concept, and pass through the calculating that knowledge mapping data assist text similarity, The separating capacity to different terms can effectively be enhanced, improve the accuracy of Text similarity computing；On the other hand, by knowledge graph Modal data can effectively extract the emphasis word in text as auxiliary data, and by attention model, thus effectively The expression effect of content of text is improved, the accuracy for calculating text similarity is further promoted, promotes the usage experience of user.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.

Fig. 1 is shown can be using a kind of exemplary system of Text similarity computing method and device of the embodiment of the present disclosure The schematic diagram of system framework；

Fig. 2 shows the structural schematic diagrams of the computer system of the electronic equipment suitable for being used to realize the embodiment of the present disclosure；

Fig. 3, which is diagrammatically illustrated, calculates text phase by term frequency-inverse document frequency according to one embodiment of the disclosure Like the schematic diagram of degree；

Fig. 4 is diagrammatically illustrated to be illustrated according to the process of the Text similarity computing method of one embodiment of the disclosure Figure；

Fig. 5 diagrammatically illustrates the structural schematic diagram of the knowledge mapping data according to one embodiment of the disclosure；

Fig. 6 diagrammatically illustrates the signal of the corresponding algorithm frame of attention mechanism according to one embodiment of the disclosure Figure；

Fig. 7 diagrammatically illustrates the schematic block diagram of the Text similarity computing device according to one embodiment of the disclosure.

Specific embodiment

Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein；On the contrary, thesing embodiments are provided so that the disclosure will more Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.In the following description, it provides perhaps More details fully understand embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that can It is omitted with technical solution of the disclosure one or more in the specific detail, or others side can be used Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution to avoid a presumptuous guest usurps the role of the host and So that all aspects of this disclosure thicken.

In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place These functional entitys are realized in reason device device and/or microcontroller device.

Fig. 1 is shown can answer using a kind of the exemplary of Text similarity computing method and device of the embodiment of the present disclosure With the schematic diagram of the system architecture of environment.

As shown in Figure 1, system architecture 100 may include one or more of terminal device 101,102,103, network 104 and server 105.Network 104 between terminal device 101,102,103 and server 105 to provide communication link Medium.Network 104 may include various connection types, such as wired, wireless communication link or fiber optic cables etc..Terminal is set Standby 101,102,103 can be the various electronic equipments with display screen, including but not limited to desktop computer, portable computing Machine, smart phone and tablet computer etc..It should be understood that the number of terminal device, network and server in Fig. 1 is only to show Meaning property.According to needs are realized, any number of terminal device, network and server can have.For example server 105 can be with It is the server cluster etc. of multiple server compositions.

Text similarity computing method provided by the embodiment of the present disclosure is generally executed by server 105, correspondingly, text Similarity calculation device is generally positioned in server 105.But it will be readily appreciated by those skilled in the art that the embodiment of the present disclosure Provided Text similarity computing method can also be executed by terminal device 101,102,103, correspondingly, text similarity meter Calculating device also can be set in terminal device 101,102,103, not do particular determination in the present exemplary embodiment to this.Citing For, in a kind of exemplary embodiment, it can be user and uploaded multiple content of text by terminal device 101,102,103 To server 105, server calculates multiple content of text by Text similarity computing method provided by the embodiment of the present disclosure Similarity, and calculated result is transferred to terminal device 101,102,103 etc..

Fig. 2 shows the structural schematic diagrams of the computer system of the electronic equipment suitable for being used to realize the embodiment of the present disclosure.

It should be noted that Fig. 2 shows the computer system 200 of electronic equipment be only an example, should not be to this public affairs The function and use scope for opening embodiment bring any restrictions.

As shown in Fig. 2, computer system 200 includes central processing unit (CPU) 201, it can be read-only according to being stored in Program in memory (ROM) 202 or be loaded into the program in random access storage device (RAM) 203 from storage section 208 and Execute various movements appropriate and processing.In RAM 203, it is also stored with various programs and data needed for system operatio.CPU 201, ROM 202 and RAM 203 is connected with each other by bus 204.Input/output (I/O) interface 205 is also connected to bus 204。

I/O interface 205 is connected to lower component: the importation 206 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 207 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 208 including hard disk etc.； And the communications portion 209 of the network interface card including LAN card, modem etc..Communications portion 209 via such as because The network of spy's net executes communication process.Driver 210 is also connected to I/O interface 205 as needed.Detachable media 211, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 210, in order to read from thereon Computer program be mounted into storage section 208 as needed.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer below with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 209, and/or from detachable media 211 are mounted.When the computer program is executed by central processing unit (CPU) 201, execute in the present processes and device The various functions of limiting.In some embodiments, computer system 200 can also include AI (Artificial Intelligence, artificial intelligence) processor, the AI processor is for handling the calculating operation in relation to machine learning.

It should be noted that computer-readable medium shown in the disclosure can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In open, computer-readable signal media may include in a base band or as the data-signal that carrier wave a part is propagated, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.

Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard The mode of part realizes that described unit also can be set in the processor.Wherein, the title of these units is in certain situation Under do not constitute restriction to the unit itself.

As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in electronic equipment described in above-described embodiment；It is also possible to individualism, and without in the supplying electronic equipment. Above-mentioned computer-readable medium carries one or more program, when the electronics is set by one for said one or multiple programs When standby execution, so that method described in electronic equipment realization as the following examples.For example, the electronic equipment can be real Each step now as shown in Figure 4 etc..

The technical solution of the embodiment of the present disclosure is described in detail below:

In a kind of technical solution, by carrying out Text similarity computing based on inverse text frequency (TF-IDF) algorithm of word frequency-. First according to TF-IDF algorithm, the keyword of two articles is found out.Here TF-IDF algorithm is mainly the word frequency for passing through calculating (TF) the anti-document frequency (IDF) that * is calculated.By formula it is recognised that TF-IDF and the number occurred in the document are at just Than being inversely proportional with the number of files comprising the word.Fig. 3 is diagrammatically illustrated according to one embodiment of the disclosure through word frequency- Inverse document frequency calculates the schematic diagram of text similarity, refering to what is shown in Fig. 3, taking out several keywords from every article (such as 20) are merged into a set, calculate every article for the word frequency of the word in the set (in order to avoid article length Difference, Relative Word frequency can be used)；Then the respective word frequency vector of two articles is generated；Finally calculate the remaining of two vectors String similarity, the value being calculated mean that more greatly two articles are more similar.

There are also in a kind of technical solution, article similarity calculation is carried out by being based on Doc2vec algorithm.The program is mainly led to The core concept for crossing Word2Vec algorithm, is added paragraph vector or sentence vector during training, the paragraph vector or Person's sentence vector is the equal of the memory unit either theme of the paragraph or sentence of context.By a large amount of corpus into The vector of row every article available after training indicates, and then obtains the similarity of two articles by cosine similarity.

In actual application scenarios, above two scheme can have two large problems when carrying out article similarity calculation: : there is the case where many different words indicate identical meaning, such as " master craftsman of the Spring and Autumn period No. seven " in synonym problems, " small short usually in article Leg " and " small master craftsman of the Spring and Autumn period " indicate game role " master craftsman of the Spring and Autumn period No. seven "；Or polysemant problem: identical word is in different context following tables Show that different meanings, such as " apple " are looked like in article by the inclusion of two layers: fruit and Apple Inc..The problem will lead to biography The effect is relatively poor in practical applications for the article similarity calculating method of Doc2vec and IF-IDF of the system based on word order, that is, counts The accuracy for calculating article similarity is poor.

Inventors have found that additionally introducing some auxiliary informations when solving the problems, such as synonym problems and polysemant as input The accuracy of article similarity calculation can effectively be promoted.And knowledge mapping can accurately portray article as the dictionary of auxiliary Or the description of concept, can effectively enhance the separating capacity of different words, in addition, using based on Encoder (encoder) and The attention mechanism of Decoder (decoder) can effectively extract the heavy duty word of article, to effectively improve the table of article Show effect, and then promotes the accuracy for calculating similarity.

Based on said one or multiple problems, this example embodiment provides a kind of Text similarity computing method.It should Text similarity computing method can be applied to above-mentioned server 105, also can be applied in above-mentioned terminal 101,102,103 One or more is not done particular determination to this in the present exemplary embodiment, is illustrated so that terminal executes this method as an example below. Fig. 4 diagrammatically illustrates the flow diagram of the Text similarity computing method according to one embodiment of the disclosure, with reference to Fig. 4 Shown, text similarity calculating method may comprise steps of S410 to step S440:

Step S410, target text and the corresponding knowledge mapping data of the target text are obtained, and to the knowledge Spectrum data carries out conversion process with the corresponding relationship characteristic vector of the determination knowledge mapping data；

Step S420, word segmentation processing is carried out to the target text, determines the corresponding original statement sequence of the target text Column；

Step S430, the original statement sequence is determined by the relationship characteristic vector and pre-established attention model Arrange corresponding first text vector；

Step S440, the second text vector of pre-set text is obtained, and according to preset algorithm to first text vector And second text vector is calculated with the similarity of the determination target text and the pre-set text.

In the Text similarity computing method provided by this example embodiment, on the one hand, pass through knowledge mapping data Presentation-entity or concept and entity or the corresponding incidence relation of concept, and assist text similar by knowledge mapping data The calculating of degree can effectively enhance the separating capacity to different terms, improve the accuracy of Text similarity computing；Another party Face using knowledge mapping data as auxiliary data, and can effectively extract the emphasis word in text by attention model, To effectively improve the expression effect of content of text, the accuracy for calculating text similarity is further promoted, promotes user's Usage experience.

In the following, the above-mentioned steps for this example embodiment are described in more details.

In step S410, target text and the corresponding knowledge mapping data of the target text are obtained, and to described Knowledge mapping data carry out conversion process with the corresponding relationship characteristic vector of the determination knowledge mapping data.

In an example embodiment of the disclosure, target text can refer to that user chooses from preset multiple texts , need to carry out the text of Text similarity computing.Knowledge mapping data can refer to for describing present in real world The dictionary or model of incidence relation between various entities or concept and entity or concept, such as refering to what is shown in Fig. 5, knowledge Spectrum data can be game knowledge mapping as shown in Figure 5 and only schematically illustrate in figure certainly, this example embodiment pair This does not do particular determination.Wherein, each entity (can refer to the name entity inside game, include the game role in game Name, the distinctive entity such as skin name) or the ID (such as " master craftsman of the Spring and Autumn period No. seven ") of one globally unique determination of concept identify, each Attribute value is used to portray the inherent attribute of entity to (such as " master craftsman of the Spring and Autumn period No. seven " and " technical ability 1 "), and relationship (such as " Has_ Skill ") it is used to connect two entities, portray the association of attribute value pair.It is abundant that knowledge mapping data can merge multiple data sources Data semantic information, and the implicit information that can be obtained in conjunction with reasoning provides service for user.Pass through knowledge mapping data energy Semantic association abundant between entity is enough portrayed, the information of certain specific peripheral game can be not only extended, can also strengthen game The association of the knowledge such as role and race, thus knowledge mapping data can be provided for Text similarity computing more accurately it is auxiliary Supplementary information.Conversion process, which can refer to, converts low dimensional for high-dimensional, isomerization, visual knowledge mapping data Feature vector indicates, such as conversion processing can be encoder trained in advance, be also possible to the translation model trained in advance, when So, it only schematically illustrates herein, this example embodiment is not limited.Relationship characteristic vector can refer to knowledge mapping data The low dimensional feature vector obtained by conversion process, by relationship characteristic vector characterize knowledge mapping data, make terminal or Server is able to use or handles knowledge mapping data.

Specifically, terminal obtains the relationship array in knowledge mapping data；Translation model based on pre-training, according to subject Vector, predicate vector and object vector determine the corresponding relationship characteristic vector of knowledge mapping data.Relationship array can refer to The array of the Attribute Association relationship of characterization entity in knowledge mapping data or concept, such as with continued reference to shown in Fig. 5, relationship Array can be Attribute Association relationship (role identification 501 " role 1 ", 502 " Recom_ of incidence relation in knowledge mapping data Equip ", association content 503 " equipment 5 "), it certainly, only schematically illustrates, this example embodiment should not be caused any herein Particular determination.Relation data may include subject vector, predicate vector and object vector, such as with continued reference to shown in Fig. 5, lead Language vector can be relationship array, and (role identification 501 " role 1 ", incidence relation 502 " Recom_Equip " are associated with content 503 " equipment 5 ") in role identification 501 " role 1 ", predicate vector can be relationship array (role identification 501 " role 1 ", association Relationship 502 " Recom_Equip ", association content 503 " equipment 5 ") in incidence relation 502 " Recom_Equip ", object vector It can be relationship array (role identification 501 " role 1 ", incidence relation 502 " Recom_Equip ", the association " equipment of content 503 5 ") the association content 503 " equipment 5 " in, certainly, only schematically illustrates, this example embodiment is not limited herein.First Extract the relationship array in knowledge mapping data, then by subject vector, predicate vector and the object in relationship array to Amount, which is converted to vector, to be indicated to obtain relationship characteristic vector, calculation amount when can not only reduce conversion knowledge spectrum data, improves system The performance of system, and the relationship characteristic vector generated includes subject, predicate with the sentence structure of object, can be further ensured that meter Calculate accuracy when text similarity.

Optionally, it before carrying out conversion process to knowledge mapping data, needs to the corresponding knowledge mapping of target text Data are constructed, it is assumed for example that target text is text relevant to field of play, then constructs game knowledge mapping data；It is false If target text is text relevant to animals and plants field, then animals and plants knowledge mapping data are constructed, this example embodiment is to this Do not do particular determination.Terminal constructs the corresponding ontology model of target text, and obtains mesh in target location by reptile instrument Mark the corresponding critical data of text；By critical data filling ontology model to generate the corresponding knowledge mapping data of target text simultaneously Knowledge mapping data are saved in target database.Ontology model (Ontology), which can refer to, carries out objective things in the world The model of System describe, such as with continued reference to shown in Fig. 5, ontology model can be field of play shown in fig. 5 model or Frame, carrying out fills to ontology model can be obtained the knowledge mapping data of corresponding different role name, such as can be to this Body Model fills " role 1 " relevant content, obtains " role 1 " corresponding knowledge mapping data, can also fill out to ontology model " role 2 " relevant content is filled, " role 2 " corresponding knowledge mapping data are obtained, this example embodiment does not do special limit to this It is fixed.Reptile instrument (web crawler) can refer to according to certain rules, automatically grab the program or script of information, Such as reptile instrument can be Pytnon crawler, be also possible to JSON-handle, certain reptile instrument can also be User- Agent Switcher, this example embodiment do not do particular determination to this.Target position can be the content referred in target text Corresponding web page address.Critical data can refer to the corresponding keyword of the attribute information for needing to insert in ontology model, such as With continued reference to shown in Fig. 5, for " role 1 " in game knowledge mapping data, the game role in the corresponding official website of the game It introduces in corresponding webpage, " role 1 " corresponding skill designations, skin title, heroic yoke relationship is obtained by reptile instrument Etc. critical datas, and critical data is filled into ontology model, obtains game knowledge mapping data as shown in Figure 5, certainly It only schematically illustrates herein, this example embodiment is not limited.After obtaining knowledge mapping data, by knowledge mapping number It is saved according to corresponding database is saved in, knowledge mapping data is updated and is used in order to subsequent.Pass through building Ontology model simultaneously fills ontology model by reptile instrument, and the knowledge mapping data made are more complete, accurate, know in promotion The formation efficiency of knowledge mapping data, the working efficiency of lifting system are improved while knowing the accuracy of spectrum data.

Optionally, terminal periodically acquires the corresponding new crucial number of target text in target location by reptile instrument According to；The knowledge mapping data in target database are updated according to new critical data.New critical data can refer to knowledge graph The updated data of the corresponding critical data of modal data periodically obtain new crucial number by reptile instrument in target position According to, and knowledge mapping data are updated, can timely update corresponding knowledge mapping data when critical data updates, and protect The accuracy for demonstrate,proving knowledge mapping data further increases the accuracy for calculating text similarity, promotes the usage experience of user.

In the step s 420, word segmentation processing is carried out to the target text, determines the corresponding original language of the target text Sentence sequence.

In an example embodiment of the disclosure, word segmentation processing, which can refer to, is divided into word or language for target text The treatment process that sentence indicates, such as word segmentation processing can refer to through subordinate sentence model Doc2vec or participle model Word2Vec To the process that target text is handled, certainly, this example embodiment does not do particular determination to this.Original statement sequence can be Refer to and target text is subjected to the statement sequence obtained after word segmentation processing, original statement sequence is for object statement sequence.

In step S430, the original language is determined by the relationship characteristic vector and pre-established attention model Corresponding first text vector of sentence sequence.

In an example embodiment of the disclosure, attention model can refer to based on attention mechanism (Attention Mechanism) the model constructed can provide the more attentions of regional inputs that needs are paid close attention to by attention model Source to obtain the detailed information of more required concern targets, and inhibits other garbages.First text vector can be Refer to that the vector for generating the corresponding statement sequence of target text by attention model, the first text vector can characterize target text This core content.

Specifically, attention model may include encoder and decoder；Based on relationship characteristic vector, pass through encoder In gating cycle unit by original statement sequence carry out coding generate intermediate vector；Pass through intermediate vector and preset attention Power mechanism determines the first text vector.Encoder can refer to the mould that will be encoded original statement sequence in attention model Block, encoder may include multiple gating cycle units (gated recurrent unit, GRU).Gating cycle unit can be with Refer to one of Recognition with Recurrent Neural Network (RNN) door control mechanism, it is similar to other door control mechanisms, it aims to solve the problem that in standard RNN Gradient disappearance/explosion issues and retain the long-term information of sequence simultaneously.Intermediate vector can refer to that attention model processing is former Corresponding intermediate state vector when beginning statement sequence.It can make corresponding first text of the target text generated by attention model This vector include contextual information, and can make the first text vector include target text core content, can guarantee into Accuracy when row Text similarity computing.

Further, the historical information generated in encoder is obtained, and passes through historical information, intermediate vector and attention Mechanism determines object statement sequence；Using the difference of object statement sequence and original statement sequence as loss function, and pass through Gradient descent method calculates to determine the first text vector loss function.Historical information can be encoder at eve The information generated when managing original statement sequence, by the addition of historical information, it includes upper for capable of making the first text vector generated Context information.Object statement sequence can refer to original statement sequence is handled by main force's model after the sentence sequence that generates Column.Gradient descent method (Gradient descent) can refer to a kind of single order optimization algorithm, and also commonly referred to as steepest declines Method is usually used in approaching minimum deflection model in machine learning and artificial intelligence with being used to recursiveness, to use gradient descent method Find the local minimum of a function, it is necessary to which current point corresponds to the opposite direction of gradient (or approximate gradient) on function Regulation step distance point is iterated search.

Fig. 6 diagrammatically illustrates the signal of the corresponding algorithm frame of attention mechanism according to one embodiment of the disclosure Figure.

Refering to what is shown in Fig. 6, being indicated using Encoder frame 601 and Decoder frame 602 to target text.It is thought Think to be exactly for inputting original statement sequence X=< x₁,x₂…x_n>, using the vector for the word that previous step learns, pass through one A gating cycle unit 603 is first encoded (encoder), is converted into the intermediate vector 604C=with contextual information F(x₁,x₂…x_n), then using another gating cycle unit 603 and attention mechanism 605 according in original statement sequence X Between vector 604 and formerly the historical information that has generated generates the i moment word y to be generated_i, finally obtain Y=< y₁, y₂…y_n>.Then by X, the difference between Y seeks the loss function as loss function, and by gradient descent method Solution, finally will solve obtained final vector C as the first text vector indicates target text.Finally by cosine similarity meter The first text vector and the second text vector are calculated to determine the similarity of target text and pre-set text.

In step S440, the second text vector of pre-set text is obtained, and according to preset algorithm to first text Vector and second text vector are calculated with the similarity of the determination target text and the pre-set text.

In an example embodiment of the disclosure, pre-set text, which can refer to, carries out similarity calculation with target text Text, the second text vector can refer to that the corresponding text vector of pre-set text, such as the second text vector can be default text This text vector obtained by step S410 to step S430 is also possible to the corresponding text of pre-set text that other methods obtain This vector, this example embodiment do not do particular determination to this.Preset algorithm can refer to pre-set for calculating target text The algorithm of this and pre-set text similarity, such as preset algorithm can be cosine similarity algorithm, be also possible to Euclidean distance calculation Method, certainly, preset algorithm can also be that other calculate the algorithm of text similarity, this example embodiment does not do special limit to this It is fixed.

It should be noted that although describing each step of method in the disclosure in the accompanying drawings with particular order, this is simultaneously Undesired or hint must execute these steps in this particular order, or have to carry out the ability of step shown in whole Realize desired result.Additional or alternative, it is convenient to omit multiple steps are merged into a step and executed by certain steps, And/or a step is decomposed into execution of multiple steps etc..

Further, in this example embodiment, a kind of Text similarity computing device is additionally provided.Text similarity Computing device can be applied to a server or terminal device.Refering to what is shown in Fig. 7, text similarity calculation device 700 can be with Including relationship characteristic vector determination unit 710, statement sequence determination unit 720, text vector determination unit 730 and similarity Computing unit 740.Wherein:

Relationship characteristic vector determination unit 710, for obtaining target text and the corresponding knowledge graph of the target text Modal data, and to the knowledge mapping data carry out conversion process with the corresponding relationship characteristic of the determination knowledge mapping data to Amount；

Statement sequence determination unit 720 determines the target text pair for carrying out word segmentation processing to the target text The original statement sequence answered；

Text vector determination unit 730, for true by the relationship characteristic vector and pre-established attention model Determine corresponding first text vector of the original statement sequence；

Similarity calculated 740, for obtaining the second text vector of pre-set text, and according to preset algorithm to described First text vector and second text vector are calculated with the phase of the determination target text and the pre-set text Like degree.

In a kind of exemplary embodiment of the disclosure, the attention model includes encoder and decoder；It is described Similarity calculated 740 further include:

In a kind of exemplary embodiment of the disclosure, the relationship characteristic vector determination unit 710 includes:

In a kind of exemplary embodiment of the disclosure, the Text similarity computing device 700 further includes knowledge mapping Data generating unit, the knowledge mapping data generating unit include:

In a kind of exemplary embodiment of the disclosure, the Text similarity computing device 700 further includes knowledge mapping Data updating unit, the knowledge mapping data updating unit include:

In a kind of exemplary embodiment of the disclosure, the similarity calculated 740 includes:

Each module or the detail of unit are in corresponding text similarity in above-mentioned Text similarity computing device It is described in detail in calculation method, therefore details are not described herein again.

It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claim is pointed out.

It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.

Claims

1. a kind of Text similarity computing method characterized by comprising

Target text and the corresponding knowledge mapping data of the target text are obtained, and the knowledge mapping data are turned Processing is changed with the corresponding relationship characteristic vector of the determination knowledge mapping data；

The original statement sequence corresponding first is determined by the relationship characteristic vector and pre-established attention model Text vector；

The second text vector of pre-set text is obtained, and according to preset algorithm to first text vector and second text This vector is calculated with the similarity of the determination target text and the pre-set text.

2. Text similarity computing method according to claim 1, which is characterized in that the attention model includes coding Device and decoder；

It is described to determine that the original statement sequence is corresponding by the relationship characteristic vector and pre-established attention model First text vector includes:

Based on the relationship characteristic vector, the original statement sequence is carried out by the gating cycle unit in the encoder Coding generates intermediate vector；

3. Text similarity computing method according to claim 2, which is characterized in that by the intermediate vector and in advance If attention mechanism determine that first text vector includes:

The historical information generated in the encoder is obtained, and passes through the historical information, the intermediate vector and the note Meaning power mechanism determines object statement sequence；

Using the difference of the object statement sequence and the original statement sequence as loss function, and pass through gradient descent method The loss function is calculated with determination first text vector.

4. Text similarity computing method according to claim 1, which is characterized in that carried out to the knowledge mapping data Conversion process includes: with the corresponding relationship characteristic vector of the determination knowledge mapping data

Obtain the relationship array in the knowledge mapping data；Wherein, the relationship array include subject vector, predicate vector with And object vector；

Translation model based on pre-training determines institute according to the subject vector, the predicate vector and the object vector State the corresponding relationship characteristic vector of knowledge mapping data.

5. Text similarity computing method according to claim 1, which is characterized in that obtaining target text and described Before the corresponding knowledge mapping data of target text, the method also includes:

The corresponding ontology model of the target text is constructed, and the target text is obtained in target location by reptile instrument Corresponding critical data；

The critical data is inserted into the ontology model to generate the corresponding knowledge mapping data of the target text simultaneously The knowledge mapping data are saved in target database.

6. Text similarity computing method according to claim 5, which is characterized in that the critical data is being inserted institute Ontology model is stated to generate the corresponding knowledge mapping data of the target text for the knowledge mapping data and be saved in mesh After marking database, the method also includes:

The corresponding new critical data of the target text is periodically acquired in the target location by the reptile instrument；

7. Text similarity computing method according to claim 1, which is characterized in that according to preset algorithm to described first Text vector and second text vector are calculated with the similarity of the determination target text and the pre-set text Include:

First text vector and second text vector are calculated with the determination mesh by cosine similarity Mark the similarity of text and the pre-set text.

8. a kind of Text similarity computing device characterized by comprising

Relationship characteristic vector determination unit, for obtaining target text and the corresponding knowledge mapping data of the target text, And conversion process is carried out with the corresponding relationship characteristic vector of the determination knowledge mapping data to the knowledge mapping data；

Statement sequence determination unit determines the corresponding original of the target text for carrying out word segmentation processing to the target text Beginning statement sequence；

Text vector determination unit, for determining the original by the relationship characteristic vector and pre-established attention model Corresponding first text vector of beginning statement sequence；

Similarity calculated, for obtaining the second text vector of pre-set text, and according to preset algorithm to first text This vector and second text vector are calculated with the similarity of the determination target text and the pre-set text.

9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt Claim 1 to 7 described in any item methods are realized when processor executes.

10. a kind of electronic equipment characterized by comprising

Processor；And

Memory, for storing the executable instruction of the processor；

Wherein, the processor be configured to via execute the executable instruction come perform claim require it is 1 to 7 described in any item Method.