CN110413730A - Text information matching degree detection method, device, computer equipment and storage medium - Google Patents
Text information matching degree detection method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110413730A CN110413730A CN201910569471.7A CN201910569471A CN110413730A CN 110413730 A CN110413730 A CN 110413730A CN 201910569471 A CN201910569471 A CN 201910569471A CN 110413730 A CN110413730 A CN 110413730A
- Authority
- CN
- China
- Prior art keywords
- text information
- feature vector
- vector
- hidden feature
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of text information matching degree detection methods, this method comprises: obtaining object text information and its corresponding referenced text information;The object text information is converted into the first hidden feature vector, and the referenced text information is converted into the second hidden feature vector;Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector;Logic Regression Models are obtained according to the object text information and preset keyword, the vector similarity is inputted into the Logic Regression Models, obtains the matching degree of object text information between the object text information and the referenced text information.Matching degree detection is more accurate.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of text information matching degree detection method, device, meter
Calculate machine equipment and storage medium.
Background technique
Text matches degree refers to the semantic association degree between different texts, and the determination of text matches degree is text mining and text
One of core work of this retrieval, therefore, how preferably to carry out the detection of text matches degree is always those skilled in the art pole
The problem of to pay close attention to.
The major way of prior art progress text matches degree detection are as follows: by text be mapped to one in word space to
Amount calculates Euclidean distance or COS distance between vector.Existing text matches degree detection mode only word space into
The determination of row text similarity, there is no the associations and semantic information that consider between text feature, therefore matching degree detection is inadequate
Accurately.
Summary of the invention
The purpose of the present invention is to provide a kind of text information matching degree detection method, device, computer equipments and readable
Storage medium, so that the detection of text information matching degree is more accurate.
The purpose of the present invention is achieved through the following technical solutions:
A kind of text information matching degree detection method, which comprises
Obtain object text information and its corresponding referenced text information;
The object text information is converted into the first hidden feature vector from coding structure according to default, and will be described
Referenced text information is converted to the second hidden feature vector;Wherein, the first hidden feature vector is for representing the object
The characteristic information of text information;The second hidden feature vector is used to represent the characteristic information of the referenced text information;
Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector;
Logic Regression Models are obtained according to the object text information and preset keyword, by the vector similarity
The Logic Regression Models are inputted, object text information between the object text information and the referenced text information is obtained
Matching degree.
In one embodiment, the basis is default is converted to first for the object text information from coding structure and implies
Feature vector, comprising:
The object text information is inputted into default learning algorithm, obtains object input vector;
Object input vector input is described default from coding structure, extract it is described it is default from coding structure with institute
State the corresponding first hidden feature vector of object input vector.
In one embodiment, the referenced text information includes question text letter corresponding with the object text information
Breath and received text information;The second hidden feature vector includes problem hidden feature vector sum standard hidden feature vector;
It is described that the referenced text information is converted into the second hidden feature vector, comprising:
Described problem text information is inputted into default learning algorithm, obtains problem input vector;
The input of described problem input vector is default from coding structure, it extracts described preset and is asked from coding structure with described
Inscribe the corresponding described problem hidden feature vector of input vector;
The received text information input is preset into learning algorithm, obtains standard input vector;
Standard input vector input is described default from coding structure, extract it is described it is default from coding structure with institute
State the corresponding standard hidden feature vector of standard input vector.
In one embodiment, the acquisition object text information and its step of corresponding referenced text information after,
Further include:
Obtain training feature vector associated with the object text information;
According to the training feature vector, to prestore it is multiple be trained from coding structure, it is self-editing to obtain multiple training
Code structure;
Information loss amount of each training from coding structure is calculated, the smallest training of information loss amount is chosen and encodes knot certainly
Structure, as default from coding structure.
In one embodiment, the vector similarity includes problem similarity and standard similarity;Described in the calculating
Vector similarity between first hidden feature vector and the second hidden feature vector, comprising:
The included angle cosine value between the first hidden feature vector sum described problem hidden feature vector is calculated, institute is obtained
State problem similarity;
The included angle cosine value between standard hidden feature vector described in the first hidden feature vector sum is calculated, institute is obtained
State standard similarity.
In one embodiment, described that logistic regression mould is obtained according to the object text information and preset keyword
Type, comprising:
Obtain the crucial Word similarity between predetermined keyword and the object text information;
The crucial Word similarity and the vector similarity are set as to the parameter of preset initial regression model, obtain with
The corresponding Logic Regression Models of the object text information.
In one embodiment, the keyword obtained between predetermined keyword and the object text information is similar
Degree, comprising:
The information value of each keyword in predetermined keyword library is calculated, the keyword that information value is greater than preset threshold is chosen
It is set as the predetermined keyword;
The object text information is split to obtain multiple object words, calculates the predetermined keyword and the subject word
The similarity of language;
It chooses the maximum value in the similarity and is set as the crucial Word similarity.
A kind of text information matching degree detection device, described device include:
Text information obtains module, for obtaining object text information and its corresponding referenced text information;
Text information conversion module, for from coding structure the object text information to be converted to first hidden according to default
The second hidden feature vector is converted to containing feature vector, and by the referenced text information;Wherein, first hidden feature
Vector is used to represent the characteristic information of the object text information;The second hidden feature vector is described with reference to text for representing
The characteristic information of this information;
Vector similarity obtains module, for calculating the first hidden feature vector and the second hidden feature vector
Between vector similarity;
Matching degree detection module, for obtaining logistic regression mould according to the object text information and preset keyword
The vector similarity is inputted the Logic Regression Models by type, obtains the object text information and the referenced text is believed
The matching degree of object text information between breath.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing
Device realizes following steps when executing the computer program:
Obtain object text information and its corresponding referenced text information;
The object text information is converted into the first hidden feature vector from coding structure according to default, and will be described
Referenced text information is converted to the second hidden feature vector;Wherein, the first hidden feature vector is for representing the object
The characteristic information of text information;The second hidden feature vector is used to represent the characteristic information of the referenced text information;
Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector;
Logic Regression Models are obtained according to the object text information and preset keyword, by the vector similarity
The Logic Regression Models are inputted, object text information between the object text information and the referenced text information is obtained
Matching degree.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor
Following steps are realized when row:
Obtain object text information and its corresponding referenced text information;
The object text information is converted into the first hidden feature vector from coding structure according to default, and will be described
Referenced text information is converted to the second hidden feature vector;Wherein, the first hidden feature vector is for representing the object
The characteristic information of text information;The second hidden feature vector is used to represent the characteristic information of the referenced text information;
Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector;
Logic Regression Models are obtained according to the object text information and preset keyword, by the vector similarity
The Logic Regression Models are inputted, object text information between the object text information and the referenced text information is obtained
Matching degree.
Text information matching degree detection method provided by the invention obtains object text information and its corresponding with reference to text
This information;The object text information is converted into the first hidden feature vector, and the referenced text information is converted to
Second hidden feature vector;The vector calculated between the first hidden feature vector and the second hidden feature vector is similar
Degree, can effectively extract the implicit semantic feature between object text information and referenced text information and be matched;According to institute
It states object text information and preset keyword obtains Logic Regression Models, the vector similarity is inputted into the logic and is returned
Return model, obtain the matching degree of object text information between the object text information and the referenced text information, pass through by
Vector similarity input and object text envelope between implicit semantic feature between object text information and referenced text information
Corresponding Logic Regression Models are ceased, can effectively improve the accuracy of text information matching degree detection.
The additional aspect of the application and advantage will be set forth in part in the description, these will become from the following description
It obtains obviously, or recognized by the practice of the application.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments
Obviously and it is readily appreciated that, in which:
Fig. 1 is the applied environment figure of text information matches degree detection method in one embodiment;
Fig. 2 is the flow diagram of text information matches degree detection method in one embodiment;
Fig. 3 is the flow diagram of text information matches degree detection method in another embodiment;
Fig. 4 is the structural block diagram of text information matches degree detection device in one embodiment;
Fig. 5 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one
It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention
Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition
Other one or more features, integer, step, operation, element, component and/or their group.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art
Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also
Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art
The consistent meaning of meaning, and unless idealization or meaning too formal otherwise will not be used by specific definitions as here
To explain.
Text information matching degree detection method provided by the present application, can be applied in application environment as shown in Figure 1, figure
In server can be realized using computer equipment, the computer equipment include the processor connected by device bus,
Memory, network interface and database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The meter
The database for calculating machine equipment is used to store text information matching degree and detects the data being related to.The network interface of the computer equipment is used
It is communicated in passing through network connection with external terminal.Specifically, server obtains object text information and its corresponding reference text
This information;The object text information is converted to the first hidden feature vector by server, and by the referenced text information
Be converted to the second hidden feature vector;Server calculate the first hidden feature vector and the second hidden feature vector it
Between vector similarity;Server obtains Logic Regression Models according to the object text information and preset keyword, will
The vector similarity inputs the Logic Regression Models, obtains between the object text information and the referenced text information
The matching degree of object text information.Those skilled in the art of the present technique are appreciated that " server " used herein above can be with solely
The server clusters of the either multiple servers compositions of vertical server is realized.
In one embodiment, it as shown in Fig. 2, providing a kind of text information matching degree detection method, answers in this way
For being illustrated for the server in Fig. 1, comprising the following steps:
Step S201 obtains object text information and its corresponding referenced text information.
In this step, object text information can be the answer text of matching degree to be detected;Referenced text information can be
Question text corresponding with answer text and received text.
By taking text is read and appraised as an example, user is object text information for the answer that problem is made, and referenced text information is to ask
Topic and model answer corresponding with problem;Matching degree between test object text information and referenced text information, i.e. judgement are answered
The process of semantic association degree between case and problem and model answer.
In one embodiment, acquisition object text information and its corresponding referenced text information described in the step S201
The step of after, further includes:
A1 obtains training feature vector associated with the object text information.
A2, according to the training feature vector, to prestore it is multiple be trained from coding structure, obtain multiple training from
Coding structure;
It in this step, can be by the way that text information be converted to hidden feature vector from coding structure;Wherein, from coding
Structure is a kind of neural network, and input is encoded from the feature of coding structure, is then decoded, so that input and output
Difference minimizes.
A3 calculates information loss amount of each training from coding structure, it is self-editing to choose the smallest training of information loss amount
Code structure, as default from coding structure.
It in the specific implementation process, is the process for making to output and input difference minimum from the training process of coding structure,
Training feature vector is inputted respectively multiple and different from coding structure, the different differences from coding structure are hidden layer quantity
With the difference of unit numbers per hidden layer, multiple parameters from coding structure are adjusted separately, respective coding structure is made to export and train spy
It levies vector difference to minimize, according to each training from the difference value of coding structure output and input, encodes knot certainly from multiple training
Target is chosen in structure from coding structure.
The object text information is converted to the first hidden feature vector by step S202, and by the referenced text
Information is converted to the second hidden feature vector.
In this step, hidden feature vector be will input from the feature that the feature of coding structure is encoded to
Amount, remains the bulk information being originally inputted from the input vector of coding structure, for representing object of the input from coding structure
The characteristic information of text information and referenced text information;Hidden feature vector is decoded reduction again from coding structure, is obtained
Export feature coding.
In one embodiment, the first hidden feature vector is converted to by the object text information for step S202,
May include:
The object text information is inputted default learning algorithm, obtains object input vector by B1.
B2, object input vector input is default from coding structure, extract it is described it is default from coding structure with institute
State the corresponding first hidden feature vector of object input vector.
In the present embodiment, default learning algorithm is the algorithm for translating text into corresponding vector, for example, passing through
The library sklearn in Python converts object text information to the object input vector of bag of words characteristic formp;Wherein,
Python is a kind of computer programming language;Sklearn, also referred to as scikit-learn are the machines based on python
Learning database can be convenient the implementation for carrying out machine learning algorithm, comprising: classification, recurrence, the selection of cluster, dimensionality reduction, model and pre- place
The related algorithm of the data minings such as reason.
For example, existing text one: " I likes eating apple, and apple is full of nutrition " and text two: " I likes eating pears ",
It is then segmented first by the library jieba in Python to separate the word in language, then passes through the library sklearn
It establishes bag of words feature (feature will include " I ", " liking ", " eating ", " apple ", " nutrition ", " abundant ", " pears "), and according to
Word frequency of occurrence determines the character numerical value of each sample, available, the feature vector (1,1,1,2,1,1,0) of text one,
The feature vector of text two is (1,1,1,0,0,0,1)).Wherein, the library jieba is a kind of Python Chinese word segmentation library.
Further, referenced text information includes question text information corresponding with the object text information and standard text
This information;The second hidden feature vector includes problem hidden feature vector sum standard hidden feature vector;For step
The referenced text information is converted into the second hidden feature vector in S202, comprising:
Described problem text information is inputted default learning algorithm, obtains problem input vector by B3;Described problem is inputted
Vector input is default from coding structure, extracts described default corresponding with described problem input vector from coding structure described ask
Inscribe hidden feature vector;
The received text information input is preset learning algorithm, obtains standard input vector by B4;The standard is inputted
Vector input is described default from coding structure, extracts the default institute corresponding with the standard input vector from coding structure
State standard hidden feature vector.
In the present embodiment, object text information, referenced text information are separately converted to pair by default learning algorithm
As input vector and reference input vector;Then object input vector and reference input vector are input to respectively default from coding
Structure extracts from the first hidden feature vector corresponding with object input vector in coding structure, and and reference input vector
Corresponding second hidden feature vector, the implicit semantic that can effectively extract between object text information and referenced text information are special
Sign.
It is similar to the vector between the second hidden feature vector to calculate the first hidden feature vector by step S203
Degree.
In this step, calculating of the vector about similarity usually calculates the distance between two vectors, and distance is got over
Closely, similarity is bigger, can use cosine similarity calculation method, calculates the first hidden feature vector and the described second implicit spy
Levy the vector similarity between vector.
In one embodiment, the vector similarity includes problem similarity and standard similarity;Described in step S203
Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector, comprising:
C1 calculates the included angle cosine value between the first hidden feature vector sum described problem hidden feature vector, obtains
To described problem similarity;
C2 calculates the included angle cosine value between standard hidden feature vector described in the first hidden feature vector sum, obtains
To the standard similarity.
Wherein, cosine similarity calculation method is also known as cosine similarity, is the included angle cosine by calculating two vectors
Value assesses their similarity;0 degree of cosine of an angle value is 1, and the cosine value of other any angles is all not more than 1, and its
Minimum value is -1, so that the cosine value of the angle between two vectors determines whether two vectors are pointed generally in identical direction.
When two vectors are equally directed to, the value of cosine similarity is 1;When two vector angles are 90 °, the value of cosine similarity is
0;When two vectors are directed toward exactly opposite direction, the value of cosine similarity is -1;Cosine similarity is commonly used in the positive space, because
This value provided is between 0 to 1.
Step S204 obtains Logic Regression Models according to the object text information and preset keyword, will be described
Vector similarity inputs the Logic Regression Models, obtains object between the object text information and the referenced text information
The matching degree of text information.
In this step, the parameter of Logic Regression Models is calculated by object text information and predetermined keyword, then
By vector similarity input logic regression model, a matching degree numerical value is exported.
By taking text scores as an example, series of parameters is calculated in the answer text and predetermined keyword answered according to user,
Corresponding Logic Regression Models are established according to obtained parameter, then the similarity between answer text and referenced text is input to
Logic Regression Models, so that it may obtain a matching score value.
In the following, the acquisition process that will illustrate Logic Regression Models in the present invention in conjunction with Fig. 3 and specific embodiment.At one
In embodiment, Logic Regression Models, packet are obtained according to the object text information and preset keyword described in step S204
It includes:
S410 obtains the crucial Word similarity between predetermined keyword and the object text information;
The crucial Word similarity and the vector similarity are set as the parameter of preset initial regression model by S420,
Obtain the Logic Regression Models corresponding with the object text information.
In one embodiment, step S410 obtains the keyword phase between predetermined keyword and the object text information
Like degree, comprising:
D1 calculates the information value of each keyword in predetermined keyword library, chooses the pass that information value is greater than preset threshold
Keyword is set as the predetermined keyword;
D2 splits the object text information to obtain multiple object words, calculates the predetermined keyword and described right
As the similarity of word;
D3 chooses the maximum value in the similarity and is set as the crucial Word similarity.
During choosing keyword, the bigger keyword of information value illustrates that the keyword can more judge object text
The semantic degree of association of this information, for example, highest ten keywords of information value in default dictionary are calculated, by this ten keys
Word calculates similarity with multiple object words respectively, then chooses in object text and crucial that highest object of Word similarity
Word, so that it may ten final similarity values are obtained, by ten similarity values and vector similarity together as logistic regression
The parameter of model.
Above-mentioned text information matching degree detection method is believed by obtaining object text information and its corresponding referenced text
Breath;The object text information is converted into the first hidden feature vector, and the referenced text information is converted to second
Hidden feature vector;The vector similarity between the first hidden feature vector and the second hidden feature vector is calculated,
The implicit semantic feature between object text information and referenced text information can effectively be extracted and matched;According to described right
As text information and preset keyword acquisition Logic Regression Models, the vector similarity is inputted into the logistic regression mould
Type obtains the matching degree of object text information between the object text information and the referenced text information, by by object
Vector similarity input and object text information pair between implicit semantic feature between text information and referenced text information
The Logic Regression Models answered can effectively improve the accuracy of text information matching degree detection.
It should be understood that although each step in the flow chart of Fig. 2-3 is successively shown according to the instruction of arrow,
These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps
Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-3
Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps
Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively
It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately
It executes.
In one of the embodiments, as shown in figure 4, providing a kind of text information matching degree detection device, device packet
It includes:
Text information obtains module 401, for obtaining object text information and its corresponding referenced text information;
Text information conversion module 402, for the object text information to be converted to the first hidden feature vector, and
The referenced text information is converted into the second hidden feature vector;
Vector similarity obtains module 403, for calculating the first hidden feature vector and second hidden feature
Vector similarity between vector;
Matching degree detection module 404 is returned for obtaining logic according to the object text information and preset keyword
Return model, the vector similarity is inputted into the Logic Regression Models, obtains the object text information and described with reference to text
The matching degree of object text information between this information.
Specific restriction about text information matching degree detection device may refer to above for text information matching degree
The restriction of detection method, details are not described herein.Modules in above-mentioned text information matching degree detection device can whole or portion
Divide and is realized by software, hardware and combinations thereof.Above-mentioned each module can be embedded in the form of hardware or independently of computer equipment
In processor in, can also be stored in a software form in the memory in computer equipment, in order to processor calling hold
The corresponding operation of the above modules of row.
In one embodiment, a kind of server is provided, which can be realized using computer equipment, in
Portion's structure chart can be as shown in Figure 5.The computer equipment includes that the processor, memory, network connected by device bus connects
Mouth and database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The storage of the computer equipment
Device includes non-volatile memory medium, built-in storage.The non-volatile memory medium be stored with operating device, computer program and
Database.The built-in storage provides environment for the operation of operating device and computer program in non-volatile memory medium.It should
The database of computer equipment is used to store text information matching degree and detects the data being related to.The network interface of the computer equipment
For being communicated with external terminal by network connection.To realize a kind of text information when the computer program is executed by processor
Matching degree detection method.
It will be understood by those skilled in the art that structure shown in Fig. 5, only part relevant to application scheme is tied
The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment
It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, is stored in memory
Computer program, which performs the steps of when executing computer program obtains object text information and its corresponding ginseng
Examine text information;The object text information is converted into the first hidden feature vector, and the referenced text information is turned
It is changed to the second hidden feature vector;Calculate the vector between the first hidden feature vector and the second hidden feature vector
Similarity;Logic Regression Models are obtained according to the object text information and preset keyword, by the vector similarity
The Logic Regression Models are inputted, object text information between the object text information and the referenced text information is obtained
Matching degree.
The acquisition target is from coding structure when processor executes computer program in one of the embodiments, comprising:
The object text information is inputted into default learning algorithm, obtains object input vector;Object input vector input is pre-
If extracting default first hidden feature corresponding with the object input vector from coding structure from coding structure
Vector.
When processor executes computer program in one of the embodiments, the referenced text information include with it is described right
As the corresponding question text information of text information and received text information;The second hidden feature vector includes the implicit spy of problem
Levy vector sum standard hidden feature vector;It is described that the referenced text information is converted into the second hidden feature vector, comprising: will
Described problem text information inputs default learning algorithm, obtains problem input vector;The input of described problem input vector is default
From coding structure, extract the default described problem hidden feature corresponding with described problem input vector from coding structure to
Amount;The received text information input is preset into learning algorithm, obtains standard input vector;The standard input vector is inputted
It is described default from coding structure, it is hidden to extract the default standard corresponding with the standard input vector from coding structure
Containing feature vector.
Processor executes the acquisition object text information and its correspondence when computer program in one of the embodiments,
Referenced text information the step of after, further includes: obtain associated with object text information training feature vector;Root
According to the training feature vector, to prestore it is multiple be trained from coding structure, obtain multiple training from coding structure;It calculates
Each training chooses the smallest training of information loss amount from coding structure, as default from the information loss amount of coding structure
From coding structure.
It includes problem similarity that processor, which executes vector similarity when computer program, in one of the embodiments,
With standard similarity;Calculating the first hidden feature vector is similar to the vector between the second hidden feature vector
Degree, comprising: calculate the included angle cosine value between the first hidden feature vector sum described problem hidden feature vector, obtain institute
State problem similarity;The included angle cosine value between standard hidden feature vector described in the first hidden feature vector sum is calculated,
Obtain the standard similarity.
In one of the embodiments, processor execute when computer program it is described according to the object text information and
Preset keyword obtains Logic Regression Models, comprising: obtains the key between predetermined keyword and the object text information
Word similarity;The crucial Word similarity and the vector similarity are set as to the parameter of preset initial regression model, obtained
The Logic Regression Models corresponding with the object text information.
The acquisition predetermined keyword and the object when processor executes computer program in one of the embodiments,
Crucial Word similarity between text information, comprising: calculate the information value of each keyword in predetermined keyword library, choose information
The keyword that value is greater than preset threshold is set as the predetermined keyword;It splits the object text information to obtain multiple objects
Word calculates the similarity of the predetermined keyword and the object word;It chooses the maximum value in the similarity and is set as institute
State crucial Word similarity.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated
Machine program performs the steps of when being executed by processor obtains object text information and its corresponding referenced text information;By institute
It states object text information and is converted to the first hidden feature vector, and the referenced text information is converted into the second hidden feature
Vector;Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector;According to described
Object text information and preset keyword obtain Logic Regression Models, and the vector similarity is inputted the logistic regression
Model obtains the matching degree of object text information between the object text information and the referenced text information.
The acquisition target is wrapped from coding structure when computer program is executed by processor in one of the embodiments,
It includes: the object text information being inputted into default learning algorithm, obtains object input vector;The object input vector is inputted
It presets from coding structure, extracts default corresponding with the object input vector from the coding structure described first implicit spy
Levy vector.
When computer program is executed by processor in one of the embodiments, the referenced text information include with it is described
The corresponding question text information of object text information and received text information;The second hidden feature vector includes that problem is implicit
Feature vector and standard hidden feature vector;It is described that the referenced text information is converted into the second hidden feature vector, comprising:
Described problem text information is inputted into default learning algorithm, obtains problem input vector;The input of described problem input vector is pre-
If extracting the default described problem hidden feature corresponding with described problem input vector from coding structure from coding structure
Vector;The received text information input is preset into learning algorithm, obtains standard input vector;The standard input vector is defeated
Enter described default from coding structure, the extraction default standard corresponding with the standard input vector from coding structure
Hidden feature vector.
The acquisition object text information and its right when computer program is executed by processor in one of the embodiments,
After the step of referenced text information answered, further includes: obtain training feature vector associated with the object text information;
According to the training feature vector, to prestore it is multiple be trained from coding structure, obtain multiple training from coding structure;Meter
Information loss amount of each training from coding structure is calculated, chooses the smallest training of information loss amount from coding structure, as pre-
If from coding structure.
The vector similarity includes that problem is similar when computer program is executed by processor in one of the embodiments,
Degree and standard similarity;The vector phase calculated between the first hidden feature vector and the second hidden feature vector
Like degree, comprising: calculate the included angle cosine value between the first hidden feature vector sum described problem hidden feature vector, obtain
Described problem similarity;Calculate the included angle cosine between standard hidden feature vector described in the first hidden feature vector sum
Value, obtains the standard similarity.
When computer program is executed by processor in one of the embodiments, it is described according to the object text information with
And preset keyword obtains Logic Regression Models, comprising: obtains the pass between predetermined keyword and the object text information
Keyword similarity;The crucial Word similarity and the vector similarity are set as to the parameter of preset initial regression model, obtained
To the Logic Regression Models corresponding with the object text information.
The acquisition predetermined keyword and described right when computer program is executed by processor in one of the embodiments,
As the crucial Word similarity between text information, comprising: calculate the information value of each keyword in predetermined keyword library, choose letter
The keyword that breath value is greater than preset threshold is set as the predetermined keyword;It is multiple right that the object text information is split to obtain
As word, the similarity of the predetermined keyword and the object word is calculated;The maximum value chosen in the similarity is set as
The key Word similarity.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Instruct relevant hardware to complete by computer program, computer program to can be stored in a non-volatile computer readable
It takes in storage medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, this Shen
Please provided by any reference used in each embodiment to memory, storage, database or other media, may each comprise
Non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield all should be considered as described in this specification.
Only several embodiments of the present invention are expressed for above embodiments, and the description thereof is more specific and detailed, but can not
Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art, In
Under the premise of not departing from present inventive concept, various modifications and improvements can be made, and these are all within the scope of protection of the present invention.
Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.
Claims (10)
1. a kind of text information matching degree detection method, which is characterized in that the described method includes:
Obtain object text information and its corresponding referenced text information;
The object text information is converted into the first hidden feature vector from coding structure according to default, and by the reference
Text information is converted to the second hidden feature vector;Wherein, the first hidden feature vector is for representing the object text
The characteristic information of information;The second hidden feature vector is used to represent the characteristic information of the referenced text information;
Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector;
Logic Regression Models are obtained according to the object text information and preset keyword, the vector similarity is inputted
The Logic Regression Models obtain the matching of object text information between the object text information and the referenced text information
Degree.
2. the method according to claim 1, wherein the basis it is default from coding structure by the object text
Information is converted to the first hidden feature vector, comprising:
The object text information is inputted into default learning algorithm, obtains object input vector;
Object input vector input is described default from coding structure, extract it is described it is default from coding structure with it is described right
As the corresponding first hidden feature vector of input vector.
3. the method according to claim 1, wherein the referenced text information includes and the object text envelope
Cease corresponding question text information and received text information;The second hidden feature vector includes problem hidden feature vector sum
Standard hidden feature vector;It is described that the referenced text information is converted into the second hidden feature vector, comprising:
Described problem text information is inputted into default learning algorithm, obtains problem input vector;
The input of described problem input vector is default from coding structure, it extracts described default defeated with described problem from coding structure
The corresponding described problem hidden feature vector of incoming vector;
The received text information input is preset into learning algorithm, obtains standard input vector;
Standard input vector input is described default from coding structure, extract it is described it is default from coding structure with the mark
The corresponding standard hidden feature vector of quasi- input vector.
4. the method according to claim 1, wherein the acquisition object text information and its corresponding reference text
After the step of this information, further includes:
Obtain training feature vector associated with the object text information;
According to the training feature vector, to prestore it is multiple be trained from coding structure, obtain multiple training from encoding knot
Structure;
It calculates information loss amount of each training from coding structure, chooses the smallest training of information loss amount from coding structure,
As default from coding structure.
5. according to the method described in claim 3, it is characterized in that, the vector similarity includes problem similarity and standard phase
Like degree;The vector similarity calculated between the first hidden feature vector and the second hidden feature vector, comprising:
The included angle cosine value between the first hidden feature vector sum described problem hidden feature vector is calculated, described ask is obtained
Inscribe similarity;
The included angle cosine value between standard hidden feature vector described in the first hidden feature vector sum is calculated, the mark is obtained
Quasi- similarity.
6. the method according to claim 1, wherein described according to the object text information and preset pass
Keyword obtains Logic Regression Models, comprising:
Obtain the crucial Word similarity between predetermined keyword and the object text information;
The crucial Word similarity and the vector similarity are set as to the parameter of preset initial regression model, obtain with it is described
The corresponding Logic Regression Models of object text information.
7. according to the method described in claim 6, it is characterized in that, the acquisition predetermined keyword and the object text information
Between crucial Word similarity, comprising:
The information value of each keyword in predetermined keyword library is calculated, the keyword for choosing information value greater than preset threshold is set as
The predetermined keyword;
The object text information is split to obtain multiple object words, calculates the predetermined keyword and the object word
Similarity;
It chooses the maximum value in the similarity and is set as the crucial Word similarity.
8. a kind of text information matching degree detection device, which is characterized in that described device includes:
Text information obtains module, for obtaining object text information and its corresponding referenced text information;
Text information conversion module, for the object text information to be converted to the first implicit spy from coding structure according to default
Vector is levied, and the referenced text information is converted into the second hidden feature vector;Wherein, the first hidden feature vector
For representing the characteristic information of the object text information;The second hidden feature vector is for representing the referenced text letter
The characteristic information of breath;
Vector similarity obtains module, for calculating between the first hidden feature vector and the second hidden feature vector
Vector similarity;
Matching degree detection module, for obtaining Logic Regression Models according to the object text information and preset keyword,
The vector similarity is inputted into the Logic Regression Models, obtain the object text information and the referenced text information it
Between object text information matching degree.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists
In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910569471.7A CN110413730A (en) | 2019-06-27 | 2019-06-27 | Text information matching degree detection method, device, computer equipment and storage medium |
PCT/CN2019/103650 WO2020258506A1 (en) | 2019-06-27 | 2019-08-30 | Text information matching degree detection method and apparatus, computer device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910569471.7A CN110413730A (en) | 2019-06-27 | 2019-06-27 | Text information matching degree detection method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110413730A true CN110413730A (en) | 2019-11-05 |
Family
ID=68359982
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910569471.7A Pending CN110413730A (en) | 2019-06-27 | 2019-06-27 | Text information matching degree detection method, device, computer equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110413730A (en) |
WO (1) | WO2020258506A1 (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111180086A (en) * | 2019-12-12 | 2020-05-19 | 平安医疗健康管理股份有限公司 | Data matching method and device, computer equipment and storage medium |
CN111191457A (en) * | 2019-12-16 | 2020-05-22 | 浙江大搜车软件技术有限公司 | Natural language semantic recognition method and device, computer equipment and storage medium |
CN111401076A (en) * | 2020-04-09 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | Text similarity determination method and device and electronic equipment |
CN111639161A (en) * | 2020-05-29 | 2020-09-08 | 中国工商银行股份有限公司 | System information processing method, apparatus, computer system and medium |
CN112597281A (en) * | 2020-12-28 | 2021-04-02 | 中国农业银行股份有限公司 | Information acquisition method and device |
CN112749252A (en) * | 2020-07-14 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Text matching method based on artificial intelligence and related device |
CN112989784A (en) * | 2021-03-04 | 2021-06-18 | 广州汇才创智科技有限公司 | Text automatic scoring method and device based on twin neural network and electronic equipment |
WO2021139424A1 (en) * | 2020-05-14 | 2021-07-15 | 平安科技(深圳)有限公司 | Text content quality evaluation method, apparatus and device, and storage medium |
CN113157871A (en) * | 2021-05-27 | 2021-07-23 | 东莞心启航联贸网络科技有限公司 | News public opinion text processing method, server and medium applying artificial intelligence |
CN113672694A (en) * | 2020-05-13 | 2021-11-19 | 武汉Tcl集团工业研究院有限公司 | Text processing method, terminal and storage medium |
CN113836942A (en) * | 2021-02-08 | 2021-12-24 | 宏龙科技(杭州)有限公司 | Text matching method based on hidden keywords |
CN113989859A (en) * | 2021-12-28 | 2022-01-28 | 江苏苏宁银行股份有限公司 | Fingerprint similarity identification method and device for anti-flashing equipment |
CN116188091A (en) * | 2023-05-04 | 2023-05-30 | 品茗科技股份有限公司 | Method, device, equipment and medium for automatic matching unit price reference of cost list |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113343987B (en) * | 2021-06-30 | 2023-08-22 | 北京奇艺世纪科技有限公司 | Text detection processing method and device, electronic equipment and storage medium |
CN114003305B (en) * | 2021-10-22 | 2024-03-15 | 济南浪潮数据技术有限公司 | Device similarity calculation method, computer device, and storage medium |
CN117195860B (en) * | 2023-11-07 | 2024-03-26 | 品茗科技股份有限公司 | Intelligent inspection method, system, electronic equipment and computer readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920654A (en) * | 2018-06-29 | 2018-11-30 | 泰康保险集团股份有限公司 | A kind of matched method and apparatus of question and answer text semantic |
CN109189931A (en) * | 2018-09-05 | 2019-01-11 | 腾讯科技(深圳)有限公司 | A kind of screening technique and device of object statement |
CN109766428A (en) * | 2019-02-02 | 2019-05-17 | 中国银行股份有限公司 | Data query method and apparatus, data processing method |
CN109829299A (en) * | 2018-11-29 | 2019-05-31 | 电子科技大学 | A kind of unknown attack recognition methods based on depth self-encoding encoder |
CN109871531A (en) * | 2019-01-04 | 2019-06-11 | 平安科技(深圳)有限公司 | Hidden feature extracting method, device, computer equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109918663B (en) * | 2019-03-04 | 2021-01-08 | 腾讯科技(深圳)有限公司 | Semantic matching method, device and storage medium |
-
2019
- 2019-06-27 CN CN201910569471.7A patent/CN110413730A/en active Pending
- 2019-08-30 WO PCT/CN2019/103650 patent/WO2020258506A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920654A (en) * | 2018-06-29 | 2018-11-30 | 泰康保险集团股份有限公司 | A kind of matched method and apparatus of question and answer text semantic |
CN109189931A (en) * | 2018-09-05 | 2019-01-11 | 腾讯科技(深圳)有限公司 | A kind of screening technique and device of object statement |
CN109829299A (en) * | 2018-11-29 | 2019-05-31 | 电子科技大学 | A kind of unknown attack recognition methods based on depth self-encoding encoder |
CN109871531A (en) * | 2019-01-04 | 2019-06-11 | 平安科技(深圳)有限公司 | Hidden feature extracting method, device, computer equipment and storage medium |
CN109766428A (en) * | 2019-02-02 | 2019-05-17 | 中国银行股份有限公司 | Data query method and apparatus, data processing method |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111180086A (en) * | 2019-12-12 | 2020-05-19 | 平安医疗健康管理股份有限公司 | Data matching method and device, computer equipment and storage medium |
CN111180086B (en) * | 2019-12-12 | 2023-04-25 | 平安医疗健康管理股份有限公司 | Data matching method, device, computer equipment and storage medium |
CN111191457A (en) * | 2019-12-16 | 2020-05-22 | 浙江大搜车软件技术有限公司 | Natural language semantic recognition method and device, computer equipment and storage medium |
CN111191457B (en) * | 2019-12-16 | 2023-09-15 | 浙江大搜车软件技术有限公司 | Natural language semantic recognition method, device, computer equipment and storage medium |
CN111401076A (en) * | 2020-04-09 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | Text similarity determination method and device and electronic equipment |
CN111401076B (en) * | 2020-04-09 | 2023-04-25 | 支付宝(杭州)信息技术有限公司 | Text similarity determination method and device and electronic equipment |
CN113672694A (en) * | 2020-05-13 | 2021-11-19 | 武汉Tcl集团工业研究院有限公司 | Text processing method, terminal and storage medium |
WO2021139424A1 (en) * | 2020-05-14 | 2021-07-15 | 平安科技(深圳)有限公司 | Text content quality evaluation method, apparatus and device, and storage medium |
CN111639161A (en) * | 2020-05-29 | 2020-09-08 | 中国工商银行股份有限公司 | System information processing method, apparatus, computer system and medium |
CN112749252A (en) * | 2020-07-14 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Text matching method based on artificial intelligence and related device |
CN112749252B (en) * | 2020-07-14 | 2023-11-03 | 腾讯科技(深圳)有限公司 | Text matching method and related device based on artificial intelligence |
CN112597281A (en) * | 2020-12-28 | 2021-04-02 | 中国农业银行股份有限公司 | Information acquisition method and device |
CN113836942A (en) * | 2021-02-08 | 2021-12-24 | 宏龙科技(杭州)有限公司 | Text matching method based on hidden keywords |
CN113836942B (en) * | 2021-02-08 | 2022-09-20 | 宏龙科技(杭州)有限公司 | Text matching method based on hidden keywords |
CN112989784A (en) * | 2021-03-04 | 2021-06-18 | 广州汇才创智科技有限公司 | Text automatic scoring method and device based on twin neural network and electronic equipment |
CN113157871B (en) * | 2021-05-27 | 2021-12-21 | 宿迁硅基智能科技有限公司 | News public opinion text processing method, server and medium applying artificial intelligence |
CN113157871A (en) * | 2021-05-27 | 2021-07-23 | 东莞心启航联贸网络科技有限公司 | News public opinion text processing method, server and medium applying artificial intelligence |
CN113989859A (en) * | 2021-12-28 | 2022-01-28 | 江苏苏宁银行股份有限公司 | Fingerprint similarity identification method and device for anti-flashing equipment |
CN113989859B (en) * | 2021-12-28 | 2022-05-06 | 江苏苏宁银行股份有限公司 | Fingerprint similarity identification method and device for anti-flashing equipment |
CN116188091A (en) * | 2023-05-04 | 2023-05-30 | 品茗科技股份有限公司 | Method, device, equipment and medium for automatic matching unit price reference of cost list |
Also Published As
Publication number | Publication date |
---|---|
WO2020258506A1 (en) | 2020-12-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110413730A (en) | Text information matching degree detection method, device, computer equipment and storage medium | |
CN111695352A (en) | Grading method and device based on semantic analysis, terminal equipment and storage medium | |
CN110853626B (en) | Bidirectional attention neural network-based dialogue understanding method, device and equipment | |
CN109800407A (en) | Intension recognizing method, device, computer equipment and storage medium | |
CN111247581B (en) | Multi-language text voice synthesizing method, device, equipment and storage medium | |
CN111460807A (en) | Sequence labeling method and device, computer equipment and storage medium | |
CN108959257A (en) | A kind of natural language analytic method, device, server and storage medium | |
CN112633003A (en) | Address recognition method and device, computer equipment and storage medium | |
CN110442677A (en) | Text matches degree detection method, device, computer equipment and readable storage medium storing program for executing | |
KR102143745B1 (en) | Method and system for error correction of korean using vector based on syllable | |
CN113408574B (en) | License plate classification method, license plate classification device and computer readable storage medium | |
CN112257437A (en) | Voice recognition error correction method and device, electronic equipment and storage medium | |
CN112528637A (en) | Text processing model training method and device, computer equipment and storage medium | |
CN112632248A (en) | Question answering method, device, computer equipment and storage medium | |
CN115146068A (en) | Method, device and equipment for extracting relation triples and storage medium | |
Liu et al. | Multimodal emotion recognition based on cascaded multichannel and hierarchical fusion | |
CN115064154A (en) | Method and device for generating mixed language voice recognition model | |
CN113051384A (en) | User portrait extraction method based on conversation and related device | |
CN113887169A (en) | Text processing method, electronic device, computer storage medium, and program product | |
CN109377203A (en) | Medical settlement data processing method, device, computer equipment and storage medium | |
CN116593980B (en) | Radar target recognition model training method, radar target recognition method and device | |
CN105975643B (en) | A kind of realtime graphic search method based on text index | |
CN112786003A (en) | Speech synthesis model training method and device, terminal equipment and storage medium | |
CN116844573A (en) | Speech emotion recognition method, device, equipment and medium based on artificial intelligence | |
CN115994220A (en) | Contact net text data defect identification method and device based on semantic mining |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |