CN109460461A

CN109460461A - Text matching technique and system based on text similarity model

Info

Publication number: CN109460461A
Application number: CN201811344782.5A
Authority: CN
Inventors: 朱钦佩
Original assignee: AI Speech Ltd
Current assignee: AI Speech Ltd
Priority date: 2018-11-13
Filing date: 2018-11-13
Publication date: 2019-03-12

Abstract

The embodiment of the present invention provides a kind of text matching technique based on text similarity model.This method comprises: receiving text information, the feature vector of text information is determined, wherein feature vector includes at least: text-string, text phonetic, term vector；In the text similarity model that feature vector is input to；Obtain the characteristic similarity of text similarity model output；Determine that at least one reaches the default sentence of default characteristic threshold value using the matched text as text information according to characteristic similarity.The embodiment of the present invention also provides the training method and system of a kind of the text matches system based on text similarity model and text similarity model.The embodiment of the present invention determines the characteristic similarity of each default sentence in user's read statement and text similarity model by using the text similarity model for considering a variety of dimensional characteristics vectors, and then determines relatively accurate higher matched text.

Description

Text matching technique and system based on text similarity model

Technical field

The present invention relates to natural language processing field more particularly to a kind of text matches sides based on text similarity model Method and system.

Background technique

Text similarity computing is the basic problem of natural language processing, requires text similarity algorithm in many fields As support.In life, due to the description of user's colloquial style, the use of input method or hand mistake etc., the description of user is simultaneously Will not as document standard, but still imply the information that user wants in the text of user's description, accurate paving is grasped These Weak Informations, it is necessary to use text similarity measurement algorithm.For example, user's input " putting up a bridge somewhere in the Changjiang river ", in fact User really wants to ask " Yangtze Bridge is somewhere ".How according to " putting up a bridge somewhere in the Changjiang river ", in default corpus " Yangtze Bridge " is searched out, is the important application scene of text similarity measurement algorithm.For another example, user, which says, " navigates to north doctor six Institute ", " north doctor six institutes " how to be said according to user search out " the 6th hospital, Peking University " in default corpus.In order to solve These problems are generally indicated the height of text similarity using the number of word similar between calculating character string, or used Statistical model carries out text similarity statistics according to multiple words that user carries out in primary dialogue, or artificially collects, to locate Manage these problems.

In realizing process of the present invention, at least there are the following problems in the related technology for inventor's discovery:

It is although able to solve subproblem using the number of word similar between calculating character string, but for because of misspelling Similar Text caused by accidentally is difficult effectively to identify, for example, " Chiba hand-pulled noodles " (qian ye la mian) and " drawing of taste thousand can be obtained The similarity ratio " dangerous hand-pulled noodles " (wei xian la mian) and " thousand hand-pulled noodles of taste " (wei in face " (wei qian la mian) Qian la mian) similarity it is higher.And (such as the various inputs of session sampling instrument are often relied on using statistical model Method, search engine), covering surface is small, and artificially collects higher cost.

Summary of the invention

In order at least solve only to consider in the prior art between character string that similarity is not caused by the number of similar word Accurately or statistical method covering surface is small, artificially collects problem at high cost.

In a first aspect, the embodiment of the present invention provides a kind of training method of text similarity model, comprising:

It receives dictionary training set and the default sentence is determined to default sentence word segmentation processing each in the dictionary training set Text-string；

According to the text-string of each default sentence, determine term vector corresponding with the text-string and with institute State the corresponding text phonetic of text-string；

According to the corresponding text-string of each default sentence, text phonetic and term vector, determine described each default The corresponding feature vector of sentence, training text similarity model.

Second aspect, the embodiment of the present invention provide a kind of text matching technique based on text similarity model, comprising:

Text information is received, determines the feature vector of the text information, wherein described eigenvector includes at least: text This character string, text phonetic, term vector；

Described eigenvector is input in the text similarity model；

Obtain the characteristic similarity of the text similarity model output；

Determine that at least one reaches the default sentence of default characteristic threshold value using as the text according to the characteristic similarity The matched text of this information.

The third aspect, the embodiment of the present invention provide a kind of training system of text similarity model, comprising:

Text-string determines program module, for receiving dictionary training set, to each default language in the dictionary training set Sentence word segmentation processing, determines the text-string of the default sentence；

Term vector and text phonetic determine program module, for the text-string according to each default sentence, determining and institute State the corresponding term vector of text-string and text phonetic corresponding with the text-string；

Text similarity model training program module, for according to the corresponding text-string of each default sentence, text This phonetic and term vector determine the corresponding feature vector of each default sentence, training text similarity model.

Fourth aspect, the embodiment of the present invention provide a kind of text matches system based on text similarity model, comprising:

Feature vector determines program module, for receiving text information, determines the feature vector of the text information, In, described eigenvector includes at least: text-string, text phonetic, term vector；

Feature vector inputs program module, for described eigenvector to be input in the text similarity model；

Characteristic similarity obtains program module, for obtaining the characteristic similarity of the text similarity model output；

Text matches program module, for determining that at least one reaches default characteristic threshold value according to the characteristic similarity Sentence is preset using the matched text as the text information.

5th aspect, provides a kind of electronic equipment comprising: at least one processor, and with described at least one Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, institute It states instruction to be executed by least one described processor, so that at least one described processor is able to carry out any embodiment of the present invention Text similarity model training method and the step of text matching technique based on text similarity model.

6th aspect, the embodiment of the present invention provide a kind of storage medium, are stored thereon with computer program, and feature exists In realizing the training method of the text similarity model of any embodiment of the present invention when the program is executed by processor and be based on The step of text matching technique of text similarity model.

The beneficial effect of the embodiment of the present invention is: can be seen that by the embodiment by determining the multiple of word Feature vector is trained text similarity model, and model parameter is more abundant, and the feature being related to is more, determining text phase It is more accurate like spending.User's read statement is determined by using the text similarity model of a variety of dimensional characteristics vectors of consideration again With the characteristic similarity of default sentence each in text similarity model, and then determine relatively precisely higher matched text.In advance If dictionary collects relatively easy, advantage of lower cost.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.

Fig. 1 is a kind of flow chart of the training method for text similarity model that one embodiment of the invention provides；

Fig. 2 is a kind of process for text matching technique based on text similarity model that one embodiment of the invention provides Figure；

Fig. 3 is a kind of structural schematic diagram of the training system for text similarity model that one embodiment of the invention provides.

Fig. 4 is that a kind of structure for text matches system based on text similarity model that one embodiment of the invention provides is shown It is intended to.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.

A kind of flow chart of the training method of the text similarity model provided as shown in Figure 1 for one embodiment of the invention, Include the following steps:

S11: receiving dictionary training set, to default sentence word segmentation processing each in the dictionary training set, determines described default The text-string of sentence；

S12: according to the text-string of each default sentence, determine term vector corresponding with the text-string and Text phonetic corresponding with the text-string；

S13: it according to the corresponding text-string of each default sentence, text phonetic and term vector, determines described each The default corresponding feature vector of sentence, training text similarity model.

In the present embodiment, it due to no longer only comparing the number of the directly similar word of text-string, but introduces New parameter carries out multiple orientation and comprehensively considers, therefore used text similarity model is also required to further training.

For step S11, dictionary training set is received, wherein a large number of users is contained in dictionary training set in daily life In some words that may use, for example, " the first affiliated hospital, Peking University ", " the second affiliated hospital, Peking University ", " north Third affiliated hospital, capital university ", " the 4th affiliated hospital, Peking University ", " KFC ", " McDonald ", " thousand hand-pulled noodles of taste ", " pepper Work mill ", " Friendship Bridge ", " Shahe bridge ", " Yongdinghe River bridge ", " Zhenyang bridge ", " Yangtze Bridge ", " Caobai River is big Bridge " ....After receiving dictionary training set, word segmentation processing is carried out to default sentence each in the dictionary training set, is determined described pre- If the text-string of sentence, for example, the Changjiang river the text-string s1=_ bridge of " Yangtze Bridge ".Wherein Words partition system In may separate an individual word, it is also possible to separate a word.

Word corresponding with the text-string is determined according to the text-string of each default sentence for step S12 Vector and text phonetic, after step S11, the determining the Changjiang river text-string s1=_ bridge.It is true according to the text-string Fixed its text phonetic p1 and term vector w1 obtains p1=chang jiang by determination | da qiao, w1=(0.323, 0.123,...)(0.564,0.348,...).Wherein, when the text-string includes Chinese character, mapping with it is described in The corresponding text phonetic of Chinese character, when the text-string includes English character, the text phonetic of the English character For described English character itself.

For step S13, according to the corresponding text-string of each default sentence, text phonetic and term vector, really The corresponding feature vector of fixed each default sentence, feature vector cover the text-string feature of default sentence, text Phonetic feature and term vector feature, and then pass through described eigenvector training text similarity model.

It can be seen that by the embodiment by determining that multiple feature vectors of word are trained text similarity mould Type, model parameter is more abundant, and the feature being related to is more, and determining text similarity is more accurate.

A kind of text matching technique based on text similarity model of one embodiment of the invention offer is provided Flow chart includes the following steps:

S21: text information is received, determines the feature vector of the text information, wherein described eigenvector is at least wrapped It includes: text-string, text phonetic, term vector；

S22: described eigenvector is input in the text similarity model；

S23: the characteristic similarity of the text similarity model output is obtained；

S24: determine that at least one reaches the default sentence of default characteristic threshold value using as institute according to the characteristic similarity State the matched text of text information.

In the present embodiment, the text similarity model by the claim 1 training carries out specific practical application.

For step S21, text information is received, wherein the text information can be inputted according to user by voice, phase The equipment answered carries out speech recognition, and the text information obtained, can also according to user by the input method of corresponding equipment into Row input.For example, user carries out text input by input method, due to the hand shaking or general idea or other situations of user, User has got " the Changjiang river bridging " by input method.And then determine the feature vector of " the Changjiang river bridging " of user's input, including text This character string, text phonetic, term vector.Wherein, the Changjiang river text-string s2=_ bridging, text phonetic p2=chang jiang | da qiao, term vector w2=(0.1234,0.2133 ...) (0.823,0.234 ...).

For step S22, the feature vector determined in the step s 21 is input to the text similarity model In, it is compared according to the various features with the default sentence in text similarity model.

For step S23, after step s 22, the characteristic similarity of the text similarity model output is obtained, wherein Characteristic similarity includes the characteristic similarity of each default sentence in the word and text similarity model of user's input.

At least one, which reaches default threshold, is determined according to the characteristic similarity determined in step S23 for step S24 Matched text of the default sentence of value as the text information.

It can be seen that by the embodiment true by using the text similarity model of a variety of dimensional characteristics vectors of consideration Make the characteristic similarity of each default sentence in user's read statement and text similarity model, so determine relatively precisely compared with High matched text.Default dictionary collects relatively easy, advantage of lower cost.

As an implementation, in the present embodiment, the default characteristic threshold value includes pre-set text threshold value, described to obtain The characteristic similarity for taking text similarity model output includes:

When described eigenvector include at least text-string when, according to the text-string of the text information with it is described The text-string of each default sentence determines the text of the text information and each default sentence in text similarity model Similarity；

The default sentence that the text similarity is more than pre-set text threshold value is determined as matched character string set；

According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with The characteristic similarity of default sentence in the matched character string set.

In the present embodiment, the default characteristic threshold value includes pre-set text threshold value, also, works as described eigenvector extremely When less including text-string, according to the text-string and the text similarity model of the text information of user input The text-string of interior each default sentence determines the text similarity of the text information and each default sentence.Namely first With one of various features vector feature, similarity-rough set is carried out.Determine that a range is lesser more than pre-set text threshold The matched character string set of the default sentence of value.

After determining matched character string set, in the text envelope for being determined user's input together according to various features vector The characteristic similarity of breath and the default sentence in matched character string set.

It can be seen that by the embodiment by first using single feature, to the pre- of the text similarity model If sentence carries out preliminary screening.It filters out relatively small-scale matched character string set and passes through various features vector again and determine Corresponding matched text accelerates the efficiency of determining matched text.

As an implementation, in the present embodiment, the default characteristic threshold value includes default phonetic threshold value, described to obtain The characteristic similarity for taking text similarity model output includes:

When described eigenvector includes at least text phonetic, according to the text phonetic of the text information and the text The text phonetic of each default sentence determines the pinyin similarity of the text information and each default sentence in similarity model；

The pinyin similarity is determined to be more than to preset the default sentence of phonetic threshold value as matching phonetic set；

According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with The characteristic similarity of default sentence in the matching phonetic set.

In the present embodiment, the default characteristic threshold value includes default phonetic threshold value, also, works as described eigenvector extremely When less including text phonetic, according to each in the text phonetic and the text similarity model of the text information of user input The text phonetic of default sentence determines the pinyin similarity of the text information and each default sentence.Similarly, and first it uses One of various features vector feature carries out similarity-rough set.Determine that a range is lesser more than default phonetic threshold value Default sentence matching phonetic set.

After determining matching phonetic set, in the text information for being determined user's input together according to various features vector With the characteristic similarity of the default sentence matched in phonetic set.

It can be seen that by the embodiment by first using single feature, to the pre- of the text similarity model If sentence carries out preliminary screening.Relatively small-scale matching phonetic set is filtered out, then is driven out by various features vector Corresponding matched text accelerates the efficiency of determining matched text.

As an implementation, in the present embodiment, the default characteristic threshold value includes default vector threshold, described to obtain The characteristic similarity for taking text similarity model output includes:

It is similar to the text according to the term vector of the text information when described eigenvector includes at least term vector The term vector of each default sentence determines the vector similarity of the text information and each default sentence in degree model；

The vector similarity is determined to be more than to preset the default sentence of vector threshold as matching vector set；

According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with The characteristic similarity of default sentence in the matching vector set.

In the present embodiment, the default characteristic threshold value includes default vector threshold, also, works as described eigenvector extremely When less including term vector, according to each default in the term vector and the text similarity model of the text information of user input The term vector of sentence determines the vector similarity of the text information and each default sentence.Similarly, and first with a variety of spies One of vector feature is levied, similarity-rough set is carried out.Determine that a range is lesser default more than default vector threshold The matching vector set of sentence.

After determining matching vector set, in the text information for being determined user's input together according to various features vector With the characteristic similarity of the default sentence in matching vector set.

It can be seen that by the embodiment by first using single feature, to the pre- of the text similarity model If sentence carries out preliminary screening.Relatively small-scale matching vector set is filtered out, then is driven out by various features vector Corresponding matched text accelerates the efficiency of determining matched text.

As an implementation, in the present embodiment, described to determine that at least one reaches default according to characteristic similarity The default sentence of characteristic threshold value includes: using the matched text as the text information

When according to the sequence of similarity from high to low, determining only one is more than the default sentence conduct for presetting characteristic threshold value When the matched text of the text information, using one default sentence as the matched text of the text information；Or

It is more than the default sentence work for presetting characteristic threshold value when having at least two according to the sequence determination of similarity from high to low For the text information matched text when, described at least two default sentences are sent to user；

Receive the default sentence of user's selection；

Using the selected default sentence as the matched text of the text information.

In the present embodiment, can according to similarity from high to low determine the default language for reaching default characteristic threshold value Matched text of the sentence as the text information.Wherein when only determining a matched text, for example, the text envelope of user's input Breath is " the Changjiang river bridging ", and a matched text of the determination by similarity by height on earth is " Yangtze Bridge ", " the Changjiang river by described in The matched text of " the Changjiang river bridging " that bridge " is inputted as user.

When determining at least two matched texts, for example, the text information of user's input is " BJ Univ Hospital ", by similar At least two determining matched texts of degree are " Peking University First Hospital ", " the second hospital, Peking University ", " Peking University's third Hospital " ... receives the default sentence of user's selection to user feedback, such as user selects " The Third Affiliated Hospital of Peking University ", by institute State matched text of the default sentence selected as text information.

It can be seen that the matched text by determining specified quantity by the embodiment, provide more for user With mode, matching range is expanded, while also improving the usage experience of user.

A kind of structural representation of the training system of text similarity model of one embodiment of the invention offer is provided Figure, which can be performed the training method of text similarity model described in above-mentioned any embodiment, and configure in the terminal.

A kind of training system of text similarity model provided in this embodiment includes: that text-string determines program module 11, term vector and text phonetic determine program module 12 and text similarity model training program module 13.

Wherein, text-string determines program module 11 for receiving dictionary training set, to each in the dictionary training set Default sentence word segmentation processing, determines the text-string of the default sentence；Term vector and text phonetic determine program module 12 For the text-string according to each default sentence, determine term vector corresponding with the text-string and with the text The corresponding text phonetic of this character string；Text similarity model training program module 13 is used for according to each default sentence pair Text-string, text phonetic and the term vector answered determine the corresponding feature vector of each default sentence, training text phase Like degree model.

A kind of text matches system based on text similarity model of one embodiment of the invention offer is provided The text matching technique based on text similarity model described in above-mentioned any embodiment can be performed in structural schematic diagram, the system, And it configures in the terminal.

A kind of text matches system based on text similarity model provided in this embodiment includes: that feature vector determines journey Sequence module 21, feature vector input program module 22, and characteristic similarity obtains program module 23 and text matches program module 24.

Wherein, feature vector determines program module 21 for receiving text information, determine the feature of the text information to Amount, wherein described eigenvector includes at least: text-string, text phonetic, term vector；Feature vector inputs program module 22 for described eigenvector to be input in the text similarity model；Characteristic similarity obtains program module 23 and is used for Obtain the characteristic similarity of the text similarity model output；Text matches program module 24 is used for similar according to the feature Degree determines that at least one reaches the default sentence of default characteristic threshold value using the matched text as the text information.

Further, the default characteristic threshold value includes pre-set text threshold value, and the characteristic similarity obtains program module For:

Further, the default characteristic threshold value includes default phonetic threshold value, and the characteristic similarity obtains program module For:

Further, the default characteristic threshold value includes default vector threshold, and the characteristic similarity obtains program module For:

Further, the text matches program module is used for:

Receive the default sentence of user's selection；

The embodiment of the invention also provides a kind of nonvolatile computer storage media, computer storage medium is stored with meter The text similarity model in above-mentioned any means embodiment can be performed in calculation machine executable instruction, the computer executable instructions Training method；

As an implementation, nonvolatile computer storage media of the invention is stored with the executable finger of computer It enables, computer executable instructions setting are as follows:

The embodiment of the invention also provides a kind of nonvolatile computer storage media, computer storage medium is stored with meter Calculation machine executable instruction, the computer executable instructions can be performed in above-mentioned any means embodiment based on text similarity mould The text matching technique of type；

Described eigenvector is input in the text similarity model；

Obtain the characteristic similarity of the text similarity model output；

As a kind of non-volatile computer readable storage medium storing program for executing, it can be used for storing non-volatile software program, non-volatile Property computer executable program and module, such as the corresponding program instruction/mould of the method for the test software in the embodiment of the present invention Block.One or more program instruction is stored in non-volatile computer readable storage medium storing program for executing, when being executed by a processor, is held The training method of text similarity model in the above-mentioned any means embodiment of row and text based on text similarity model Matching process.

Non-volatile computer readable storage medium storing program for executing may include storing program area and storage data area, wherein storage journey It sequence area can application program required for storage program area, at least one function；Storage data area can be stored according to test software Device use created data etc..In addition, non-volatile computer readable storage medium storing program for executing may include that high speed is deposited at random Access to memory, can also include nonvolatile memory, a for example, at least disk memory, flush memory device or other are non- Volatile solid-state part.In some embodiments, it includes relative to place that non-volatile computer readable storage medium storing program for executing is optional The remotely located memory of device is managed, these remote memories can be by being connected to the network to the device of test software.Above-mentioned network Example include but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.

The embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor, and with described at least one The memory of a processor communication connection, wherein the memory is stored with the finger that can be executed by least one described processor Enable, described instruction executed by least one described processor so that at least one described processor be able to carry out it is of the invention any The step of training method of the text similarity model of embodiment and text matching technique based on text similarity model.

The client of the embodiment of the present application exists in a variety of forms, including but not limited to:

(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..

(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.

(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.

(4) other electronic devices having data processing function.

Herein, relational terms such as first and second and the like be used merely to by an entity or operation with it is another One entity or operation distinguish, and without necessarily requiring or implying between these entities or operation, there are any this reality Relationship or sequence.Moreover, the terms "include", "comprise", include not only those elements, but also including being not explicitly listed Other element, or further include for elements inherent to such a process, method, article, or device.Do not limiting more In the case where system, the element that is limited by sentence " including ... ", it is not excluded that including process, method, the article of the element Or there is also other identical elements in equipment.

The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. a kind of training method of text similarity model, comprising:

It receives dictionary training set and the text of the default sentence is determined to default sentence word segmentation processing each in the dictionary training set This character string；

According to the text-string of each default sentence, determine term vector corresponding with the text-string and with the text The corresponding text phonetic of this character string；

According to the corresponding text-string of each default sentence, text phonetic and term vector, each default sentence is determined Corresponding feature vector, training text similarity model.

2. a kind of text matching technique according to claim 1 based on text similarity model, comprising:

Text information is received, determines the feature vector of the text information, wherein described eigenvector includes at least: text word Symbol string, text phonetic, term vector；

Described eigenvector is input in the text similarity model；

Obtain the characteristic similarity of the text similarity model output；

Determine that at least one reaches the default sentence of default characteristic threshold value using as the text envelope according to the characteristic similarity The matched text of breath.

3. according to the method described in claim 2, wherein, the default characteristic threshold value includes pre-set text threshold value, the acquisition The characteristic similarity of text similarity model output includes:

When described eigenvector includes at least text-string, according to the text-string of the text information and the text The text-string of each default sentence determines that the text information is similar with the text of each default sentence in similarity model Degree；

According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with it is described The characteristic similarity of default sentence in matched character string set.

4. according to the method described in claim 2, wherein, the default characteristic threshold value includes default phonetic threshold value, the acquisition The characteristic similarity of text similarity model output includes:

It is similar to the text according to the text phonetic of the text information when described eigenvector includes at least text phonetic The text phonetic of each default sentence determines the pinyin similarity of the text information and each default sentence in degree model；

According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with it is described Match the characteristic similarity of the default sentence in phonetic set.

5. according to the method described in claim 2, wherein, the default characteristic threshold value includes default vector threshold, the acquisition The characteristic similarity of text similarity model output includes:

When described eigenvector includes at least term vector, according to the term vector of the text information and the text similarity mould The term vector of each default sentence determines the vector similarity of the text information and each default sentence in type；

According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with it is described The characteristic similarity of default sentence in matching vector set.

6. described to determine that at least one reaches default feature according to characteristic similarity according to the method described in claim 2, wherein The default sentence of threshold value includes: using the matched text as the text information

Described in determining that only having a default sentence more than default characteristic threshold value is used as according to the sequence of similarity from high to low When the matched text of text information, using one default sentence as the matched text of the text information；Or

It is more than the default sentence of default characteristic threshold value as institute when having at least two according to the sequence determination of similarity from high to low When stating the matched text of text information, described at least two default sentences are sent to user；

Receive the default sentence of user's selection；

7. a kind of training system of text similarity model, comprising:

Text-string determines program module, for receiving dictionary training set, to each default sentence in the dictionary training set point Word processing, determines the text-string of the default sentence；

Term vector and text phonetic determine program module, for the text-string according to each default sentence, the determining and text The corresponding term vector of this character string and text phonetic corresponding with the text-string；

Text similarity model training program module, for being spelled according to the corresponding text-string of each default sentence, text Sound and term vector determine the corresponding feature vector of each default sentence, training text similarity model.

8. a kind of text matches system according to claim 7 based on text similarity model, comprising:

Feature vector determines program module, for receiving text information, determines the feature vector of the text information, wherein institute It states feature vector to include at least: text-string, text phonetic, term vector；

Text matches program module, for determining that at least one reaches the default of default characteristic threshold value according to the characteristic similarity Sentence is using the matched text as the text information.

9. a kind of electronic equipment comprising: at least one processor, and deposited with what at least one described processor communication was connect Reservoir, wherein the memory be stored with can by least one described processor execute instruction, described instruction by it is described at least One processor executes, so that at least one described processor is able to carry out the step of any one of claim 1-6 the method Suddenly.

10. a kind of storage medium, is stored thereon with computer program, which is characterized in that the realization when program is executed by processor The step of any one of claim 1-6 the method.