CN109460461A - Text matching technique and system based on text similarity model - Google Patents
Text matching technique and system based on text similarity model Download PDFInfo
- Publication number
- CN109460461A CN109460461A CN201811344782.5A CN201811344782A CN109460461A CN 109460461 A CN109460461 A CN 109460461A CN 201811344782 A CN201811344782 A CN 201811344782A CN 109460461 A CN109460461 A CN 109460461A
- Authority
- CN
- China
- Prior art keywords
- text
- default
- similarity
- string
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
The embodiment of the present invention provides a kind of text matching technique based on text similarity model.This method comprises: receiving text information, the feature vector of text information is determined, wherein feature vector includes at least: text-string, text phonetic, term vector;In the text similarity model that feature vector is input to;Obtain the characteristic similarity of text similarity model output;Determine that at least one reaches the default sentence of default characteristic threshold value using the matched text as text information according to characteristic similarity.The embodiment of the present invention also provides the training method and system of a kind of the text matches system based on text similarity model and text similarity model.The embodiment of the present invention determines the characteristic similarity of each default sentence in user's read statement and text similarity model by using the text similarity model for considering a variety of dimensional characteristics vectors, and then determines relatively accurate higher matched text.
Description
Technical field
The present invention relates to natural language processing field more particularly to a kind of text matches sides based on text similarity model
Method and system.
Background technique
Text similarity computing is the basic problem of natural language processing, requires text similarity algorithm in many fields
As support.In life, due to the description of user's colloquial style, the use of input method or hand mistake etc., the description of user is simultaneously
Will not as document standard, but still imply the information that user wants in the text of user's description, accurate paving is grasped
These Weak Informations, it is necessary to use text similarity measurement algorithm.For example, user's input " putting up a bridge somewhere in the Changjiang river ", in fact
User really wants to ask " Yangtze Bridge is somewhere ".How according to " putting up a bridge somewhere in the Changjiang river ", in default corpus
" Yangtze Bridge " is searched out, is the important application scene of text similarity measurement algorithm.For another example, user, which says, " navigates to north doctor six
Institute ", " north doctor six institutes " how to be said according to user search out " the 6th hospital, Peking University " in default corpus.In order to solve
These problems are generally indicated the height of text similarity using the number of word similar between calculating character string, or used
Statistical model carries out text similarity statistics according to multiple words that user carries out in primary dialogue, or artificially collects, to locate
Manage these problems.
In realizing process of the present invention, at least there are the following problems in the related technology for inventor's discovery:
It is although able to solve subproblem using the number of word similar between calculating character string, but for because of misspelling
Similar Text caused by accidentally is difficult effectively to identify, for example, " Chiba hand-pulled noodles " (qian ye la mian) and " drawing of taste thousand can be obtained
The similarity ratio " dangerous hand-pulled noodles " (wei xian la mian) and " thousand hand-pulled noodles of taste " (wei in face " (wei qian la mian)
Qian la mian) similarity it is higher.And (such as the various inputs of session sampling instrument are often relied on using statistical model
Method, search engine), covering surface is small, and artificially collects higher cost.
Summary of the invention
In order at least solve only to consider in the prior art between character string that similarity is not caused by the number of similar word
Accurately or statistical method covering surface is small, artificially collects problem at high cost.
In a first aspect, the embodiment of the present invention provides a kind of training method of text similarity model, comprising:
It receives dictionary training set and the default sentence is determined to default sentence word segmentation processing each in the dictionary training set
Text-string;
According to the text-string of each default sentence, determine term vector corresponding with the text-string and with institute
State the corresponding text phonetic of text-string;
According to the corresponding text-string of each default sentence, text phonetic and term vector, determine described each default
The corresponding feature vector of sentence, training text similarity model.
Second aspect, the embodiment of the present invention provide a kind of text matching technique based on text similarity model, comprising:
Text information is received, determines the feature vector of the text information, wherein described eigenvector includes at least: text
This character string, text phonetic, term vector;
Described eigenvector is input in the text similarity model;
Obtain the characteristic similarity of the text similarity model output;
Determine that at least one reaches the default sentence of default characteristic threshold value using as the text according to the characteristic similarity
The matched text of this information.
The third aspect, the embodiment of the present invention provide a kind of training system of text similarity model, comprising:
Text-string determines program module, for receiving dictionary training set, to each default language in the dictionary training set
Sentence word segmentation processing, determines the text-string of the default sentence;
Term vector and text phonetic determine program module, for the text-string according to each default sentence, determining and institute
State the corresponding term vector of text-string and text phonetic corresponding with the text-string;
Text similarity model training program module, for according to the corresponding text-string of each default sentence, text
This phonetic and term vector determine the corresponding feature vector of each default sentence, training text similarity model.
Fourth aspect, the embodiment of the present invention provide a kind of text matches system based on text similarity model, comprising:
Feature vector determines program module, for receiving text information, determines the feature vector of the text information,
In, described eigenvector includes at least: text-string, text phonetic, term vector;
Feature vector inputs program module, for described eigenvector to be input in the text similarity model;
Characteristic similarity obtains program module, for obtaining the characteristic similarity of the text similarity model output;
Text matches program module, for determining that at least one reaches default characteristic threshold value according to the characteristic similarity
Sentence is preset using the matched text as the text information.
5th aspect, provides a kind of electronic equipment comprising: at least one processor, and with described at least one
Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, institute
It states instruction to be executed by least one described processor, so that at least one described processor is able to carry out any embodiment of the present invention
Text similarity model training method and the step of text matching technique based on text similarity model.
6th aspect, the embodiment of the present invention provide a kind of storage medium, are stored thereon with computer program, and feature exists
In realizing the training method of the text similarity model of any embodiment of the present invention when the program is executed by processor and be based on
The step of text matching technique of text similarity model.
The beneficial effect of the embodiment of the present invention is: can be seen that by the embodiment by determining the multiple of word
Feature vector is trained text similarity model, and model parameter is more abundant, and the feature being related to is more, determining text phase
It is more accurate like spending.User's read statement is determined by using the text similarity model of a variety of dimensional characteristics vectors of consideration again
With the characteristic similarity of default sentence each in text similarity model, and then determine relatively precisely higher matched text.In advance
If dictionary collects relatively easy, advantage of lower cost.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of flow chart of the training method for text similarity model that one embodiment of the invention provides;
Fig. 2 is a kind of process for text matching technique based on text similarity model that one embodiment of the invention provides
Figure;
Fig. 3 is a kind of structural schematic diagram of the training system for text similarity model that one embodiment of the invention provides.
Fig. 4 is that a kind of structure for text matches system based on text similarity model that one embodiment of the invention provides is shown
It is intended to.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
A kind of flow chart of the training method of the text similarity model provided as shown in Figure 1 for one embodiment of the invention,
Include the following steps:
S11: receiving dictionary training set, to default sentence word segmentation processing each in the dictionary training set, determines described default
The text-string of sentence;
S12: according to the text-string of each default sentence, determine term vector corresponding with the text-string and
Text phonetic corresponding with the text-string;
S13: it according to the corresponding text-string of each default sentence, text phonetic and term vector, determines described each
The default corresponding feature vector of sentence, training text similarity model.
In the present embodiment, it due to no longer only comparing the number of the directly similar word of text-string, but introduces
New parameter carries out multiple orientation and comprehensively considers, therefore used text similarity model is also required to further training.
For step S11, dictionary training set is received, wherein a large number of users is contained in dictionary training set in daily life
In some words that may use, for example, " the first affiliated hospital, Peking University ", " the second affiliated hospital, Peking University ", " north
Third affiliated hospital, capital university ", " the 4th affiliated hospital, Peking University ", " KFC ", " McDonald ", " thousand hand-pulled noodles of taste ", " pepper
Work mill ", " Friendship Bridge ", " Shahe bridge ", " Yongdinghe River bridge ", " Zhenyang bridge ", " Yangtze Bridge ", " Caobai River is big
Bridge " ....After receiving dictionary training set, word segmentation processing is carried out to default sentence each in the dictionary training set, is determined described pre-
If the text-string of sentence, for example, the Changjiang river the text-string s1=_ bridge of " Yangtze Bridge ".Wherein Words partition system
In may separate an individual word, it is also possible to separate a word.
Word corresponding with the text-string is determined according to the text-string of each default sentence for step S12
Vector and text phonetic, after step S11, the determining the Changjiang river text-string s1=_ bridge.It is true according to the text-string
Fixed its text phonetic p1 and term vector w1 obtains p1=chang jiang by determination | da qiao, w1=(0.323,
0.123,...)(0.564,0.348,...).Wherein, when the text-string includes Chinese character, mapping with it is described in
The corresponding text phonetic of Chinese character, when the text-string includes English character, the text phonetic of the English character
For described English character itself.
For step S13, according to the corresponding text-string of each default sentence, text phonetic and term vector, really
The corresponding feature vector of fixed each default sentence, feature vector cover the text-string feature of default sentence, text
Phonetic feature and term vector feature, and then pass through described eigenvector training text similarity model.
It can be seen that by the embodiment by determining that multiple feature vectors of word are trained text similarity mould
Type, model parameter is more abundant, and the feature being related to is more, and determining text similarity is more accurate.
A kind of text matching technique based on text similarity model of one embodiment of the invention offer is provided
Flow chart includes the following steps:
S21: text information is received, determines the feature vector of the text information, wherein described eigenvector is at least wrapped
It includes: text-string, text phonetic, term vector;
S22: described eigenvector is input in the text similarity model;
S23: the characteristic similarity of the text similarity model output is obtained;
S24: determine that at least one reaches the default sentence of default characteristic threshold value using as institute according to the characteristic similarity
State the matched text of text information.
In the present embodiment, the text similarity model by the claim 1 training carries out specific practical application.
For step S21, text information is received, wherein the text information can be inputted according to user by voice, phase
The equipment answered carries out speech recognition, and the text information obtained, can also according to user by the input method of corresponding equipment into
Row input.For example, user carries out text input by input method, due to the hand shaking or general idea or other situations of user,
User has got " the Changjiang river bridging " by input method.And then determine the feature vector of " the Changjiang river bridging " of user's input, including text
This character string, text phonetic, term vector.Wherein, the Changjiang river text-string s2=_ bridging, text phonetic p2=chang jiang
| da qiao, term vector w2=(0.1234,0.2133 ...) (0.823,0.234 ...).
For step S22, the feature vector determined in the step s 21 is input to the text similarity model
In, it is compared according to the various features with the default sentence in text similarity model.
For step S23, after step s 22, the characteristic similarity of the text similarity model output is obtained, wherein
Characteristic similarity includes the characteristic similarity of each default sentence in the word and text similarity model of user's input.
At least one, which reaches default threshold, is determined according to the characteristic similarity determined in step S23 for step S24
Matched text of the default sentence of value as the text information.
It can be seen that by the embodiment true by using the text similarity model of a variety of dimensional characteristics vectors of consideration
Make the characteristic similarity of each default sentence in user's read statement and text similarity model, so determine relatively precisely compared with
High matched text.Default dictionary collects relatively easy, advantage of lower cost.
As an implementation, in the present embodiment, the default characteristic threshold value includes pre-set text threshold value, described to obtain
The characteristic similarity for taking text similarity model output includes:
When described eigenvector include at least text-string when, according to the text-string of the text information with it is described
The text-string of each default sentence determines the text of the text information and each default sentence in text similarity model
Similarity;
The default sentence that the text similarity is more than pre-set text threshold value is determined as matched character string set;
According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with
The characteristic similarity of default sentence in the matched character string set.
In the present embodiment, the default characteristic threshold value includes pre-set text threshold value, also, works as described eigenvector extremely
When less including text-string, according to the text-string and the text similarity model of the text information of user input
The text-string of interior each default sentence determines the text similarity of the text information and each default sentence.Namely first
With one of various features vector feature, similarity-rough set is carried out.Determine that a range is lesser more than pre-set text threshold
The matched character string set of the default sentence of value.
After determining matched character string set, in the text envelope for being determined user's input together according to various features vector
The characteristic similarity of breath and the default sentence in matched character string set.
It can be seen that by the embodiment by first using single feature, to the pre- of the text similarity model
If sentence carries out preliminary screening.It filters out relatively small-scale matched character string set and passes through various features vector again and determine
Corresponding matched text accelerates the efficiency of determining matched text.
As an implementation, in the present embodiment, the default characteristic threshold value includes default phonetic threshold value, described to obtain
The characteristic similarity for taking text similarity model output includes:
When described eigenvector includes at least text phonetic, according to the text phonetic of the text information and the text
The text phonetic of each default sentence determines the pinyin similarity of the text information and each default sentence in similarity model;
The pinyin similarity is determined to be more than to preset the default sentence of phonetic threshold value as matching phonetic set;
According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with
The characteristic similarity of default sentence in the matching phonetic set.
In the present embodiment, the default characteristic threshold value includes default phonetic threshold value, also, works as described eigenvector extremely
When less including text phonetic, according to each in the text phonetic and the text similarity model of the text information of user input
The text phonetic of default sentence determines the pinyin similarity of the text information and each default sentence.Similarly, and first it uses
One of various features vector feature carries out similarity-rough set.Determine that a range is lesser more than default phonetic threshold value
Default sentence matching phonetic set.
After determining matching phonetic set, in the text information for being determined user's input together according to various features vector
With the characteristic similarity of the default sentence matched in phonetic set.
It can be seen that by the embodiment by first using single feature, to the pre- of the text similarity model
If sentence carries out preliminary screening.Relatively small-scale matching phonetic set is filtered out, then is driven out by various features vector
Corresponding matched text accelerates the efficiency of determining matched text.
As an implementation, in the present embodiment, the default characteristic threshold value includes default vector threshold, described to obtain
The characteristic similarity for taking text similarity model output includes:
It is similar to the text according to the term vector of the text information when described eigenvector includes at least term vector
The term vector of each default sentence determines the vector similarity of the text information and each default sentence in degree model;
The vector similarity is determined to be more than to preset the default sentence of vector threshold as matching vector set;
According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with
The characteristic similarity of default sentence in the matching vector set.
In the present embodiment, the default characteristic threshold value includes default vector threshold, also, works as described eigenvector extremely
When less including term vector, according to each default in the term vector and the text similarity model of the text information of user input
The term vector of sentence determines the vector similarity of the text information and each default sentence.Similarly, and first with a variety of spies
One of vector feature is levied, similarity-rough set is carried out.Determine that a range is lesser default more than default vector threshold
The matching vector set of sentence.
After determining matching vector set, in the text information for being determined user's input together according to various features vector
With the characteristic similarity of the default sentence in matching vector set.
It can be seen that by the embodiment by first using single feature, to the pre- of the text similarity model
If sentence carries out preliminary screening.Relatively small-scale matching vector set is filtered out, then is driven out by various features vector
Corresponding matched text accelerates the efficiency of determining matched text.
As an implementation, in the present embodiment, described to determine that at least one reaches default according to characteristic similarity
The default sentence of characteristic threshold value includes: using the matched text as the text information
When according to the sequence of similarity from high to low, determining only one is more than the default sentence conduct for presetting characteristic threshold value
When the matched text of the text information, using one default sentence as the matched text of the text information;Or
It is more than the default sentence work for presetting characteristic threshold value when having at least two according to the sequence determination of similarity from high to low
For the text information matched text when, described at least two default sentences are sent to user;
Receive the default sentence of user's selection;
Using the selected default sentence as the matched text of the text information.
In the present embodiment, can according to similarity from high to low determine the default language for reaching default characteristic threshold value
Matched text of the sentence as the text information.Wherein when only determining a matched text, for example, the text envelope of user's input
Breath is " the Changjiang river bridging ", and a matched text of the determination by similarity by height on earth is " Yangtze Bridge ", " the Changjiang river by described in
The matched text of " the Changjiang river bridging " that bridge " is inputted as user.
When determining at least two matched texts, for example, the text information of user's input is " BJ Univ Hospital ", by similar
At least two determining matched texts of degree are " Peking University First Hospital ", " the second hospital, Peking University ", " Peking University's third
Hospital " ... receives the default sentence of user's selection to user feedback, such as user selects " The Third Affiliated Hospital of Peking University ", by institute
State matched text of the default sentence selected as text information.
It can be seen that the matched text by determining specified quantity by the embodiment, provide more for user
With mode, matching range is expanded, while also improving the usage experience of user.
A kind of structural representation of the training system of text similarity model of one embodiment of the invention offer is provided
Figure, which can be performed the training method of text similarity model described in above-mentioned any embodiment, and configure in the terminal.
A kind of training system of text similarity model provided in this embodiment includes: that text-string determines program module
11, term vector and text phonetic determine program module 12 and text similarity model training program module 13.
Wherein, text-string determines program module 11 for receiving dictionary training set, to each in the dictionary training set
Default sentence word segmentation processing, determines the text-string of the default sentence;Term vector and text phonetic determine program module 12
For the text-string according to each default sentence, determine term vector corresponding with the text-string and with the text
The corresponding text phonetic of this character string;Text similarity model training program module 13 is used for according to each default sentence pair
Text-string, text phonetic and the term vector answered determine the corresponding feature vector of each default sentence, training text phase
Like degree model.
A kind of text matches system based on text similarity model of one embodiment of the invention offer is provided
The text matching technique based on text similarity model described in above-mentioned any embodiment can be performed in structural schematic diagram, the system,
And it configures in the terminal.
A kind of text matches system based on text similarity model provided in this embodiment includes: that feature vector determines journey
Sequence module 21, feature vector input program module 22, and characteristic similarity obtains program module 23 and text matches program module 24.
Wherein, feature vector determines program module 21 for receiving text information, determine the feature of the text information to
Amount, wherein described eigenvector includes at least: text-string, text phonetic, term vector;Feature vector inputs program module
22 for described eigenvector to be input in the text similarity model;Characteristic similarity obtains program module 23 and is used for
Obtain the characteristic similarity of the text similarity model output;Text matches program module 24 is used for similar according to the feature
Degree determines that at least one reaches the default sentence of default characteristic threshold value using the matched text as the text information.
Further, the default characteristic threshold value includes pre-set text threshold value, and the characteristic similarity obtains program module
For:
When described eigenvector include at least text-string when, according to the text-string of the text information with it is described
The text-string of each default sentence determines the text of the text information and each default sentence in text similarity model
Similarity;
The default sentence that the text similarity is more than pre-set text threshold value is determined as matched character string set;
According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with
The characteristic similarity of default sentence in the matched character string set.
Further, the default characteristic threshold value includes default phonetic threshold value, and the characteristic similarity obtains program module
For:
When described eigenvector includes at least text phonetic, according to the text phonetic of the text information and the text
The text phonetic of each default sentence determines the pinyin similarity of the text information and each default sentence in similarity model;
The pinyin similarity is determined to be more than to preset the default sentence of phonetic threshold value as matching phonetic set;
According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with
The characteristic similarity of default sentence in the matching phonetic set.
Further, the default characteristic threshold value includes default vector threshold, and the characteristic similarity obtains program module
For:
It is similar to the text according to the term vector of the text information when described eigenvector includes at least term vector
The term vector of each default sentence determines the vector similarity of the text information and each default sentence in degree model;
The vector similarity is determined to be more than to preset the default sentence of vector threshold as matching vector set;
According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with
The characteristic similarity of default sentence in the matching vector set.
Further, the text matches program module is used for:
When according to the sequence of similarity from high to low, determining only one is more than the default sentence conduct for presetting characteristic threshold value
When the matched text of the text information, using one default sentence as the matched text of the text information;Or
It is more than the default sentence work for presetting characteristic threshold value when having at least two according to the sequence determination of similarity from high to low
For the text information matched text when, described at least two default sentences are sent to user;
Receive the default sentence of user's selection;
Using the selected default sentence as the matched text of the text information.
The embodiment of the invention also provides a kind of nonvolatile computer storage media, computer storage medium is stored with meter
The text similarity model in above-mentioned any means embodiment can be performed in calculation machine executable instruction, the computer executable instructions
Training method;
As an implementation, nonvolatile computer storage media of the invention is stored with the executable finger of computer
It enables, computer executable instructions setting are as follows:
It receives dictionary training set and the default sentence is determined to default sentence word segmentation processing each in the dictionary training set
Text-string;
According to the text-string of each default sentence, determine term vector corresponding with the text-string and with institute
State the corresponding text phonetic of text-string;
According to the corresponding text-string of each default sentence, text phonetic and term vector, determine described each default
The corresponding feature vector of sentence, training text similarity model.
The embodiment of the invention also provides a kind of nonvolatile computer storage media, computer storage medium is stored with meter
Calculation machine executable instruction, the computer executable instructions can be performed in above-mentioned any means embodiment based on text similarity mould
The text matching technique of type;
As an implementation, nonvolatile computer storage media of the invention is stored with the executable finger of computer
It enables, computer executable instructions setting are as follows:
Text information is received, determines the feature vector of the text information, wherein described eigenvector includes at least: text
This character string, text phonetic, term vector;
Described eigenvector is input in the text similarity model;
Obtain the characteristic similarity of the text similarity model output;
Determine that at least one reaches the default sentence of default characteristic threshold value using as the text according to the characteristic similarity
The matched text of this information.
As a kind of non-volatile computer readable storage medium storing program for executing, it can be used for storing non-volatile software program, non-volatile
Property computer executable program and module, such as the corresponding program instruction/mould of the method for the test software in the embodiment of the present invention
Block.One or more program instruction is stored in non-volatile computer readable storage medium storing program for executing, when being executed by a processor, is held
The training method of text similarity model in the above-mentioned any means embodiment of row and text based on text similarity model
Matching process.
Non-volatile computer readable storage medium storing program for executing may include storing program area and storage data area, wherein storage journey
It sequence area can application program required for storage program area, at least one function;Storage data area can be stored according to test software
Device use created data etc..In addition, non-volatile computer readable storage medium storing program for executing may include that high speed is deposited at random
Access to memory, can also include nonvolatile memory, a for example, at least disk memory, flush memory device or other are non-
Volatile solid-state part.In some embodiments, it includes relative to place that non-volatile computer readable storage medium storing program for executing is optional
The remotely located memory of device is managed, these remote memories can be by being connected to the network to the device of test software.Above-mentioned network
Example include but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
The embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor, and with described at least one
The memory of a processor communication connection, wherein the memory is stored with the finger that can be executed by least one described processor
Enable, described instruction executed by least one described processor so that at least one described processor be able to carry out it is of the invention any
The step of training method of the text similarity model of embodiment and text matching technique based on text similarity model.
The client of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data
Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low
Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function
Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio,
Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) other electronic devices having data processing function.
Herein, relational terms such as first and second and the like be used merely to by an entity or operation with it is another
One entity or operation distinguish, and without necessarily requiring or implying between these entities or operation, there are any this reality
Relationship or sequence.Moreover, the terms "include", "comprise", include not only those elements, but also including being not explicitly listed
Other element, or further include for elements inherent to such a process, method, article, or device.Do not limiting more
In the case where system, the element that is limited by sentence " including ... ", it is not excluded that including process, method, the article of the element
Or there is also other identical elements in equipment.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, component shown as a unit may or may not be physics list
Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness
Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation
Method described in certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (10)
1. a kind of training method of text similarity model, comprising:
It receives dictionary training set and the text of the default sentence is determined to default sentence word segmentation processing each in the dictionary training set
This character string;
According to the text-string of each default sentence, determine term vector corresponding with the text-string and with the text
The corresponding text phonetic of this character string;
According to the corresponding text-string of each default sentence, text phonetic and term vector, each default sentence is determined
Corresponding feature vector, training text similarity model.
2. a kind of text matching technique according to claim 1 based on text similarity model, comprising:
Text information is received, determines the feature vector of the text information, wherein described eigenvector includes at least: text word
Symbol string, text phonetic, term vector;
Described eigenvector is input in the text similarity model;
Obtain the characteristic similarity of the text similarity model output;
Determine that at least one reaches the default sentence of default characteristic threshold value using as the text envelope according to the characteristic similarity
The matched text of breath.
3. according to the method described in claim 2, wherein, the default characteristic threshold value includes pre-set text threshold value, the acquisition
The characteristic similarity of text similarity model output includes:
When described eigenvector includes at least text-string, according to the text-string of the text information and the text
The text-string of each default sentence determines that the text information is similar with the text of each default sentence in similarity model
Degree;
The default sentence that the text similarity is more than pre-set text threshold value is determined as matched character string set;
According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with it is described
The characteristic similarity of default sentence in matched character string set.
4. according to the method described in claim 2, wherein, the default characteristic threshold value includes default phonetic threshold value, the acquisition
The characteristic similarity of text similarity model output includes:
It is similar to the text according to the text phonetic of the text information when described eigenvector includes at least text phonetic
The text phonetic of each default sentence determines the pinyin similarity of the text information and each default sentence in degree model;
The pinyin similarity is determined to be more than to preset the default sentence of phonetic threshold value as matching phonetic set;
According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with it is described
Match the characteristic similarity of the default sentence in phonetic set.
5. according to the method described in claim 2, wherein, the default characteristic threshold value includes default vector threshold, the acquisition
The characteristic similarity of text similarity model output includes:
When described eigenvector includes at least term vector, according to the term vector of the text information and the text similarity mould
The term vector of each default sentence determines the vector similarity of the text information and each default sentence in type;
The vector similarity is determined to be more than to preset the default sentence of vector threshold as matching vector set;
According at least to text-string, the text phonetic, term vector in described eigenvector, determine the text information with it is described
The characteristic similarity of default sentence in matching vector set.
6. described to determine that at least one reaches default feature according to characteristic similarity according to the method described in claim 2, wherein
The default sentence of threshold value includes: using the matched text as the text information
Described in determining that only having a default sentence more than default characteristic threshold value is used as according to the sequence of similarity from high to low
When the matched text of text information, using one default sentence as the matched text of the text information;Or
It is more than the default sentence of default characteristic threshold value as institute when having at least two according to the sequence determination of similarity from high to low
When stating the matched text of text information, described at least two default sentences are sent to user;
Receive the default sentence of user's selection;
Using the selected default sentence as the matched text of the text information.
7. a kind of training system of text similarity model, comprising:
Text-string determines program module, for receiving dictionary training set, to each default sentence in the dictionary training set point
Word processing, determines the text-string of the default sentence;
Term vector and text phonetic determine program module, for the text-string according to each default sentence, the determining and text
The corresponding term vector of this character string and text phonetic corresponding with the text-string;
Text similarity model training program module, for being spelled according to the corresponding text-string of each default sentence, text
Sound and term vector determine the corresponding feature vector of each default sentence, training text similarity model.
8. a kind of text matches system according to claim 7 based on text similarity model, comprising:
Feature vector determines program module, for receiving text information, determines the feature vector of the text information, wherein institute
It states feature vector to include at least: text-string, text phonetic, term vector;
Feature vector inputs program module, for described eigenvector to be input in the text similarity model;
Characteristic similarity obtains program module, for obtaining the characteristic similarity of the text similarity model output;
Text matches program module, for determining that at least one reaches the default of default characteristic threshold value according to the characteristic similarity
Sentence is using the matched text as the text information.
9. a kind of electronic equipment comprising: at least one processor, and deposited with what at least one described processor communication was connect
Reservoir, wherein the memory be stored with can by least one described processor execute instruction, described instruction by it is described at least
One processor executes, so that at least one described processor is able to carry out the step of any one of claim 1-6 the method
Suddenly.
10. a kind of storage medium, is stored thereon with computer program, which is characterized in that the realization when program is executed by processor
The step of any one of claim 1-6 the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811344782.5A CN109460461A (en) | 2018-11-13 | 2018-11-13 | Text matching technique and system based on text similarity model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811344782.5A CN109460461A (en) | 2018-11-13 | 2018-11-13 | Text matching technique and system based on text similarity model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109460461A true CN109460461A (en) | 2019-03-12 |
Family
ID=65610191
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811344782.5A Pending CN109460461A (en) | 2018-11-13 | 2018-11-13 | Text matching technique and system based on text similarity model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109460461A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245606A (en) * | 2019-06-13 | 2019-09-17 | 广东小天才科技有限公司 | A kind of text recognition method, device, equipment and storage medium |
CN110390015A (en) * | 2019-07-23 | 2019-10-29 | 中国工商银行股份有限公司 | A kind of data information processing method, apparatus and system |
CN110413988A (en) * | 2019-06-17 | 2019-11-05 | 平安科技(深圳)有限公司 | Method, apparatus, server and the storage medium of text information matching measurement |
CN110516125A (en) * | 2019-08-28 | 2019-11-29 | 拉扎斯网络科技(上海)有限公司 | Identify method, apparatus, equipment and the readable storage medium storing program for executing of unusual character string |
CN110717158A (en) * | 2019-09-06 | 2020-01-21 | 平安普惠企业管理有限公司 | Information verification method, device, equipment and computer readable storage medium |
CN111009244A (en) * | 2019-12-06 | 2020-04-14 | 贵州电网有限责任公司 | Voice recognition method and system |
CN111159339A (en) * | 2019-12-24 | 2020-05-15 | 北京亚信数据有限公司 | Text matching processing method and device |
CN111159338A (en) * | 2019-12-23 | 2020-05-15 | 北京达佳互联信息技术有限公司 | Malicious text detection method and device, electronic equipment and storage medium |
CN111753551A (en) * | 2020-06-29 | 2020-10-09 | 北京字节跳动网络技术有限公司 | Information generation method and device based on word vector generation model |
CN112000767A (en) * | 2020-07-31 | 2020-11-27 | 深思考人工智能科技(上海)有限公司 | Text-based information extraction method and electronic equipment |
CN113932518A (en) * | 2021-06-02 | 2022-01-14 | 海信(山东)冰箱有限公司 | Refrigerator and food material management method thereof |
WO2022095370A1 (en) * | 2020-11-06 | 2022-05-12 | 平安科技(深圳)有限公司 | Text matching method and apparatus, terminal device, and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605694A (en) * | 2013-11-04 | 2014-02-26 | 北京奇虎科技有限公司 | Device and method for detecting similar texts |
CN104102626A (en) * | 2014-07-07 | 2014-10-15 | 厦门推特信息科技有限公司 | Method for computing semantic similarities among short texts |
CN104239512A (en) * | 2014-09-16 | 2014-12-24 | 电子科技大学 | Text recommendation method |
US8996515B2 (en) * | 2008-06-24 | 2015-03-31 | Microsoft Corporation | Consistent phrase relevance measures |
CN104699763A (en) * | 2015-02-11 | 2015-06-10 | 中国科学院新疆理化技术研究所 | Text similarity measuring system based on multi-feature fusion |
CN106095928A (en) * | 2016-06-12 | 2016-11-09 | 国家计算机网络与信息安全管理中心 | A kind of event type recognition methods and device |
-
2018
- 2018-11-13 CN CN201811344782.5A patent/CN109460461A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8996515B2 (en) * | 2008-06-24 | 2015-03-31 | Microsoft Corporation | Consistent phrase relevance measures |
CN103605694A (en) * | 2013-11-04 | 2014-02-26 | 北京奇虎科技有限公司 | Device and method for detecting similar texts |
CN104102626A (en) * | 2014-07-07 | 2014-10-15 | 厦门推特信息科技有限公司 | Method for computing semantic similarities among short texts |
CN104239512A (en) * | 2014-09-16 | 2014-12-24 | 电子科技大学 | Text recommendation method |
CN104699763A (en) * | 2015-02-11 | 2015-06-10 | 中国科学院新疆理化技术研究所 | Text similarity measuring system based on multi-feature fusion |
CN106095928A (en) * | 2016-06-12 | 2016-11-09 | 国家计算机网络与信息安全管理中心 | A kind of event type recognition methods and device |
Non-Patent Citations (1)
Title |
---|
梁敬东 等: "基于word2vec和LSTM的句子相似度计算及其", 《南京农业大学学报》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245606B (en) * | 2019-06-13 | 2021-07-20 | 广东小天才科技有限公司 | Text recognition method, device, equipment and storage medium |
CN110245606A (en) * | 2019-06-13 | 2019-09-17 | 广东小天才科技有限公司 | A kind of text recognition method, device, equipment and storage medium |
CN110413988A (en) * | 2019-06-17 | 2019-11-05 | 平安科技(深圳)有限公司 | Method, apparatus, server and the storage medium of text information matching measurement |
CN110413988B (en) * | 2019-06-17 | 2023-01-31 | 平安科技(深圳)有限公司 | Text information matching measurement method, device, server and storage medium |
CN110390015A (en) * | 2019-07-23 | 2019-10-29 | 中国工商银行股份有限公司 | A kind of data information processing method, apparatus and system |
CN110516125A (en) * | 2019-08-28 | 2019-11-29 | 拉扎斯网络科技(上海)有限公司 | Identify method, apparatus, equipment and the readable storage medium storing program for executing of unusual character string |
CN110516125B (en) * | 2019-08-28 | 2020-05-08 | 拉扎斯网络科技(上海)有限公司 | Method, device and equipment for identifying abnormal character string and readable storage medium |
CN110717158A (en) * | 2019-09-06 | 2020-01-21 | 平安普惠企业管理有限公司 | Information verification method, device, equipment and computer readable storage medium |
CN110717158B (en) * | 2019-09-06 | 2024-03-01 | 冉维印 | Information verification method, device, equipment and computer readable storage medium |
CN111009244A (en) * | 2019-12-06 | 2020-04-14 | 贵州电网有限责任公司 | Voice recognition method and system |
CN111159338A (en) * | 2019-12-23 | 2020-05-15 | 北京达佳互联信息技术有限公司 | Malicious text detection method and device, electronic equipment and storage medium |
CN111159339A (en) * | 2019-12-24 | 2020-05-15 | 北京亚信数据有限公司 | Text matching processing method and device |
WO2022001888A1 (en) * | 2020-06-29 | 2022-01-06 | 北京字节跳动网络技术有限公司 | Information generation method and device based on word vector generation model |
CN111753551B (en) * | 2020-06-29 | 2022-06-14 | 北京字节跳动网络技术有限公司 | Information generation method and device based on word vector generation model |
CN111753551A (en) * | 2020-06-29 | 2020-10-09 | 北京字节跳动网络技术有限公司 | Information generation method and device based on word vector generation model |
CN112000767A (en) * | 2020-07-31 | 2020-11-27 | 深思考人工智能科技(上海)有限公司 | Text-based information extraction method and electronic equipment |
WO2022095370A1 (en) * | 2020-11-06 | 2022-05-12 | 平安科技(深圳)有限公司 | Text matching method and apparatus, terminal device, and storage medium |
CN113932518A (en) * | 2021-06-02 | 2022-01-14 | 海信(山东)冰箱有限公司 | Refrigerator and food material management method thereof |
CN113932518B (en) * | 2021-06-02 | 2023-08-18 | 海信冰箱有限公司 | Refrigerator and food material management method thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109460461A (en) | Text matching technique and system based on text similarity model | |
CN105976812B (en) | A kind of audio recognition method and its equipment | |
US10043520B2 (en) | Multilevel speech recognition for candidate application group using first and second speech commands | |
US20170164049A1 (en) | Recommending method and device thereof | |
CN107526846B (en) | Method, device, server and medium for generating and sorting channel sorting model | |
CN112037792B (en) | Voice recognition method and device, electronic equipment and storage medium | |
CN103699530A (en) | Method and equipment for inputting texts in target application according to voice input information | |
CN104361896B (en) | Voice quality assessment equipment, method and system | |
CN113407850B (en) | Method and device for determining and acquiring virtual image and electronic equipment | |
US20170171471A1 (en) | Method and device for generating multimedia picture and an electronic device | |
CN104866308A (en) | Scenario image generation method and apparatus | |
CN103235773B (en) | The tag extraction method and device of text based on keyword | |
CN110517692A (en) | Hot word audio recognition method and device | |
US20230029687A1 (en) | Dialog method and system, electronic device and storage medium | |
CN111028828A (en) | Voice interaction method based on screen drawing, screen drawing and storage medium | |
CN105354318A (en) | File searching method and device | |
CN109410935A (en) | A kind of destination searching method and device based on speech recognition | |
CN107112007A (en) | Speech recognition equipment and audio recognition method | |
CN111859970B (en) | Method, apparatus, device and medium for processing information | |
CN110570838B (en) | Voice stream processing method and device | |
JP7372402B2 (en) | Speech synthesis method, device, electronic device and storage medium | |
CN111680514A (en) | Information processing and model training method, device, equipment and storage medium | |
CN111477212A (en) | Content recognition, model training and data processing method, system and equipment | |
CN109147819A (en) | Audio-frequency information processing method, device and storage medium | |
CN114708859A (en) | Voice command word recognition training method and device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant after: Sipic Technology Co.,Ltd. Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant before: AI SPEECH Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190312 |