CN110413730A - Text information matching degree detection method, device, computer equipment and storage medium - Google Patents

Text information matching degree detection method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110413730A
CN110413730A CN201910569471.7A CN201910569471A CN110413730A CN 110413730 A CN110413730 A CN 110413730A CN 201910569471 A CN201910569471 A CN 201910569471A CN 110413730 A CN110413730 A CN 110413730A
Authority
CN
China
Prior art keywords
text information
feature vector
vector
hidden feature
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910569471.7A
Other languages
Chinese (zh)
Inventor
金戈
徐亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910569471.7A priority Critical patent/CN110413730A/en
Priority to PCT/CN2019/103650 priority patent/WO2020258506A1/en
Publication of CN110413730A publication Critical patent/CN110413730A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of text information matching degree detection methods, this method comprises: obtaining object text information and its corresponding referenced text information;The object text information is converted into the first hidden feature vector, and the referenced text information is converted into the second hidden feature vector;Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector;Logic Regression Models are obtained according to the object text information and preset keyword, the vector similarity is inputted into the Logic Regression Models, obtains the matching degree of object text information between the object text information and the referenced text information.Matching degree detection is more accurate.

Description

Text information matching degree detection method, device, computer equipment and storage medium
Technical field
The present invention relates to field of computer technology, more particularly to a kind of text information matching degree detection method, device, meter Calculate machine equipment and storage medium.
Background technique
Text matches degree refers to the semantic association degree between different texts, and the determination of text matches degree is text mining and text One of core work of this retrieval, therefore, how preferably to carry out the detection of text matches degree is always those skilled in the art pole The problem of to pay close attention to.
The major way of prior art progress text matches degree detection are as follows: by text be mapped to one in word space to Amount calculates Euclidean distance or COS distance between vector.Existing text matches degree detection mode only word space into The determination of row text similarity, there is no the associations and semantic information that consider between text feature, therefore matching degree detection is inadequate Accurately.
Summary of the invention
The purpose of the present invention is to provide a kind of text information matching degree detection method, device, computer equipments and readable Storage medium, so that the detection of text information matching degree is more accurate.
The purpose of the present invention is achieved through the following technical solutions:
A kind of text information matching degree detection method, which comprises
Obtain object text information and its corresponding referenced text information;
The object text information is converted into the first hidden feature vector from coding structure according to default, and will be described Referenced text information is converted to the second hidden feature vector;Wherein, the first hidden feature vector is for representing the object The characteristic information of text information;The second hidden feature vector is used to represent the characteristic information of the referenced text information;
Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector;
Logic Regression Models are obtained according to the object text information and preset keyword, by the vector similarity The Logic Regression Models are inputted, object text information between the object text information and the referenced text information is obtained Matching degree.
In one embodiment, the basis is default is converted to first for the object text information from coding structure and implies Feature vector, comprising:
The object text information is inputted into default learning algorithm, obtains object input vector;
Object input vector input is described default from coding structure, extract it is described it is default from coding structure with institute State the corresponding first hidden feature vector of object input vector.
In one embodiment, the referenced text information includes question text letter corresponding with the object text information Breath and received text information;The second hidden feature vector includes problem hidden feature vector sum standard hidden feature vector; It is described that the referenced text information is converted into the second hidden feature vector, comprising:
Described problem text information is inputted into default learning algorithm, obtains problem input vector;
The input of described problem input vector is default from coding structure, it extracts described preset and is asked from coding structure with described Inscribe the corresponding described problem hidden feature vector of input vector;
The received text information input is preset into learning algorithm, obtains standard input vector;
Standard input vector input is described default from coding structure, extract it is described it is default from coding structure with institute State the corresponding standard hidden feature vector of standard input vector.
In one embodiment, the acquisition object text information and its step of corresponding referenced text information after, Further include:
Obtain training feature vector associated with the object text information;
According to the training feature vector, to prestore it is multiple be trained from coding structure, it is self-editing to obtain multiple training Code structure;
Information loss amount of each training from coding structure is calculated, the smallest training of information loss amount is chosen and encodes knot certainly Structure, as default from coding structure.
In one embodiment, the vector similarity includes problem similarity and standard similarity;Described in the calculating Vector similarity between first hidden feature vector and the second hidden feature vector, comprising:
The included angle cosine value between the first hidden feature vector sum described problem hidden feature vector is calculated, institute is obtained State problem similarity;
The included angle cosine value between standard hidden feature vector described in the first hidden feature vector sum is calculated, institute is obtained State standard similarity.
In one embodiment, described that logistic regression mould is obtained according to the object text information and preset keyword Type, comprising:
Obtain the crucial Word similarity between predetermined keyword and the object text information;
The crucial Word similarity and the vector similarity are set as to the parameter of preset initial regression model, obtain with The corresponding Logic Regression Models of the object text information.
In one embodiment, the keyword obtained between predetermined keyword and the object text information is similar Degree, comprising:
The information value of each keyword in predetermined keyword library is calculated, the keyword that information value is greater than preset threshold is chosen It is set as the predetermined keyword;
The object text information is split to obtain multiple object words, calculates the predetermined keyword and the subject word The similarity of language;
It chooses the maximum value in the similarity and is set as the crucial Word similarity.
A kind of text information matching degree detection device, described device include:
Text information obtains module, for obtaining object text information and its corresponding referenced text information;
Text information conversion module, for from coding structure the object text information to be converted to first hidden according to default The second hidden feature vector is converted to containing feature vector, and by the referenced text information;Wherein, first hidden feature Vector is used to represent the characteristic information of the object text information;The second hidden feature vector is described with reference to text for representing The characteristic information of this information;
Vector similarity obtains module, for calculating the first hidden feature vector and the second hidden feature vector Between vector similarity;
Matching degree detection module, for obtaining logistic regression mould according to the object text information and preset keyword The vector similarity is inputted the Logic Regression Models by type, obtains the object text information and the referenced text is believed The matching degree of object text information between breath.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device realizes following steps when executing the computer program:
Obtain object text information and its corresponding referenced text information;
The object text information is converted into the first hidden feature vector from coding structure according to default, and will be described Referenced text information is converted to the second hidden feature vector;Wherein, the first hidden feature vector is for representing the object The characteristic information of text information;The second hidden feature vector is used to represent the characteristic information of the referenced text information;
Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector;
Logic Regression Models are obtained according to the object text information and preset keyword, by the vector similarity The Logic Regression Models are inputted, object text information between the object text information and the referenced text information is obtained Matching degree.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor Following steps are realized when row:
Obtain object text information and its corresponding referenced text information;
The object text information is converted into the first hidden feature vector from coding structure according to default, and will be described Referenced text information is converted to the second hidden feature vector;Wherein, the first hidden feature vector is for representing the object The characteristic information of text information;The second hidden feature vector is used to represent the characteristic information of the referenced text information;
Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector;
Logic Regression Models are obtained according to the object text information and preset keyword, by the vector similarity The Logic Regression Models are inputted, object text information between the object text information and the referenced text information is obtained Matching degree.
Text information matching degree detection method provided by the invention obtains object text information and its corresponding with reference to text This information;The object text information is converted into the first hidden feature vector, and the referenced text information is converted to Second hidden feature vector;The vector calculated between the first hidden feature vector and the second hidden feature vector is similar Degree, can effectively extract the implicit semantic feature between object text information and referenced text information and be matched;According to institute It states object text information and preset keyword obtains Logic Regression Models, the vector similarity is inputted into the logic and is returned Return model, obtain the matching degree of object text information between the object text information and the referenced text information, pass through by Vector similarity input and object text envelope between implicit semantic feature between object text information and referenced text information Corresponding Logic Regression Models are ceased, can effectively improve the accuracy of text information matching degree detection.
The additional aspect of the application and advantage will be set forth in part in the description, these will become from the following description It obtains obviously, or recognized by the practice of the application.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:
Fig. 1 is the applied environment figure of text information matches degree detection method in one embodiment;
Fig. 2 is the flow diagram of text information matches degree detection method in one embodiment;
Fig. 3 is the flow diagram of text information matches degree detection method in another embodiment;
Fig. 4 is the structural block diagram of text information matches degree detection device in one embodiment;
Fig. 5 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art The consistent meaning of meaning, and unless idealization or meaning too formal otherwise will not be used by specific definitions as here To explain.
Text information matching degree detection method provided by the present application, can be applied in application environment as shown in Figure 1, figure In server can be realized using computer equipment, the computer equipment include the processor connected by device bus, Memory, network interface and database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The meter The database for calculating machine equipment is used to store text information matching degree and detects the data being related to.The network interface of the computer equipment is used It is communicated in passing through network connection with external terminal.Specifically, server obtains object text information and its corresponding reference text This information;The object text information is converted to the first hidden feature vector by server, and by the referenced text information Be converted to the second hidden feature vector;Server calculate the first hidden feature vector and the second hidden feature vector it Between vector similarity;Server obtains Logic Regression Models according to the object text information and preset keyword, will The vector similarity inputs the Logic Regression Models, obtains between the object text information and the referenced text information The matching degree of object text information.Those skilled in the art of the present technique are appreciated that " server " used herein above can be with solely The server clusters of the either multiple servers compositions of vertical server is realized.
In one embodiment, it as shown in Fig. 2, providing a kind of text information matching degree detection method, answers in this way For being illustrated for the server in Fig. 1, comprising the following steps:
Step S201 obtains object text information and its corresponding referenced text information.
In this step, object text information can be the answer text of matching degree to be detected;Referenced text information can be Question text corresponding with answer text and received text.
By taking text is read and appraised as an example, user is object text information for the answer that problem is made, and referenced text information is to ask Topic and model answer corresponding with problem;Matching degree between test object text information and referenced text information, i.e. judgement are answered The process of semantic association degree between case and problem and model answer.
In one embodiment, acquisition object text information and its corresponding referenced text information described in the step S201 The step of after, further includes:
A1 obtains training feature vector associated with the object text information.
A2, according to the training feature vector, to prestore it is multiple be trained from coding structure, obtain multiple training from Coding structure;
It in this step, can be by the way that text information be converted to hidden feature vector from coding structure;Wherein, from coding Structure is a kind of neural network, and input is encoded from the feature of coding structure, is then decoded, so that input and output Difference minimizes.
A3 calculates information loss amount of each training from coding structure, it is self-editing to choose the smallest training of information loss amount Code structure, as default from coding structure.
It in the specific implementation process, is the process for making to output and input difference minimum from the training process of coding structure, Training feature vector is inputted respectively multiple and different from coding structure, the different differences from coding structure are hidden layer quantity With the difference of unit numbers per hidden layer, multiple parameters from coding structure are adjusted separately, respective coding structure is made to export and train spy It levies vector difference to minimize, according to each training from the difference value of coding structure output and input, encodes knot certainly from multiple training Target is chosen in structure from coding structure.
The object text information is converted to the first hidden feature vector by step S202, and by the referenced text Information is converted to the second hidden feature vector.
In this step, hidden feature vector be will input from the feature that the feature of coding structure is encoded to Amount, remains the bulk information being originally inputted from the input vector of coding structure, for representing object of the input from coding structure The characteristic information of text information and referenced text information;Hidden feature vector is decoded reduction again from coding structure, is obtained Export feature coding.
In one embodiment, the first hidden feature vector is converted to by the object text information for step S202, May include:
The object text information is inputted default learning algorithm, obtains object input vector by B1.
B2, object input vector input is default from coding structure, extract it is described it is default from coding structure with institute State the corresponding first hidden feature vector of object input vector.
In the present embodiment, default learning algorithm is the algorithm for translating text into corresponding vector, for example, passing through The library sklearn in Python converts object text information to the object input vector of bag of words characteristic formp;Wherein, Python is a kind of computer programming language;Sklearn, also referred to as scikit-learn are the machines based on python Learning database can be convenient the implementation for carrying out machine learning algorithm, comprising: classification, recurrence, the selection of cluster, dimensionality reduction, model and pre- place The related algorithm of the data minings such as reason.
For example, existing text one: " I likes eating apple, and apple is full of nutrition " and text two: " I likes eating pears ", It is then segmented first by the library jieba in Python to separate the word in language, then passes through the library sklearn It establishes bag of words feature (feature will include " I ", " liking ", " eating ", " apple ", " nutrition ", " abundant ", " pears "), and according to Word frequency of occurrence determines the character numerical value of each sample, available, the feature vector (1,1,1,2,1,1,0) of text one, The feature vector of text two is (1,1,1,0,0,0,1)).Wherein, the library jieba is a kind of Python Chinese word segmentation library.
Further, referenced text information includes question text information corresponding with the object text information and standard text This information;The second hidden feature vector includes problem hidden feature vector sum standard hidden feature vector;For step The referenced text information is converted into the second hidden feature vector in S202, comprising:
Described problem text information is inputted default learning algorithm, obtains problem input vector by B3;Described problem is inputted Vector input is default from coding structure, extracts described default corresponding with described problem input vector from coding structure described ask Inscribe hidden feature vector;
The received text information input is preset learning algorithm, obtains standard input vector by B4;The standard is inputted Vector input is described default from coding structure, extracts the default institute corresponding with the standard input vector from coding structure State standard hidden feature vector.
In the present embodiment, object text information, referenced text information are separately converted to pair by default learning algorithm As input vector and reference input vector;Then object input vector and reference input vector are input to respectively default from coding Structure extracts from the first hidden feature vector corresponding with object input vector in coding structure, and and reference input vector Corresponding second hidden feature vector, the implicit semantic that can effectively extract between object text information and referenced text information are special Sign.
It is similar to the vector between the second hidden feature vector to calculate the first hidden feature vector by step S203 Degree.
In this step, calculating of the vector about similarity usually calculates the distance between two vectors, and distance is got over Closely, similarity is bigger, can use cosine similarity calculation method, calculates the first hidden feature vector and the described second implicit spy Levy the vector similarity between vector.
In one embodiment, the vector similarity includes problem similarity and standard similarity;Described in step S203 Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector, comprising:
C1 calculates the included angle cosine value between the first hidden feature vector sum described problem hidden feature vector, obtains To described problem similarity;
C2 calculates the included angle cosine value between standard hidden feature vector described in the first hidden feature vector sum, obtains To the standard similarity.
Wherein, cosine similarity calculation method is also known as cosine similarity, is the included angle cosine by calculating two vectors Value assesses their similarity;0 degree of cosine of an angle value is 1, and the cosine value of other any angles is all not more than 1, and its Minimum value is -1, so that the cosine value of the angle between two vectors determines whether two vectors are pointed generally in identical direction. When two vectors are equally directed to, the value of cosine similarity is 1;When two vector angles are 90 °, the value of cosine similarity is 0;When two vectors are directed toward exactly opposite direction, the value of cosine similarity is -1;Cosine similarity is commonly used in the positive space, because This value provided is between 0 to 1.
Step S204 obtains Logic Regression Models according to the object text information and preset keyword, will be described Vector similarity inputs the Logic Regression Models, obtains object between the object text information and the referenced text information The matching degree of text information.
In this step, the parameter of Logic Regression Models is calculated by object text information and predetermined keyword, then By vector similarity input logic regression model, a matching degree numerical value is exported.
By taking text scores as an example, series of parameters is calculated in the answer text and predetermined keyword answered according to user, Corresponding Logic Regression Models are established according to obtained parameter, then the similarity between answer text and referenced text is input to Logic Regression Models, so that it may obtain a matching score value.
In the following, the acquisition process that will illustrate Logic Regression Models in the present invention in conjunction with Fig. 3 and specific embodiment.At one In embodiment, Logic Regression Models, packet are obtained according to the object text information and preset keyword described in step S204 It includes:
S410 obtains the crucial Word similarity between predetermined keyword and the object text information;
The crucial Word similarity and the vector similarity are set as the parameter of preset initial regression model by S420, Obtain the Logic Regression Models corresponding with the object text information.
In one embodiment, step S410 obtains the keyword phase between predetermined keyword and the object text information Like degree, comprising:
D1 calculates the information value of each keyword in predetermined keyword library, chooses the pass that information value is greater than preset threshold Keyword is set as the predetermined keyword;
D2 splits the object text information to obtain multiple object words, calculates the predetermined keyword and described right As the similarity of word;
D3 chooses the maximum value in the similarity and is set as the crucial Word similarity.
During choosing keyword, the bigger keyword of information value illustrates that the keyword can more judge object text The semantic degree of association of this information, for example, highest ten keywords of information value in default dictionary are calculated, by this ten keys Word calculates similarity with multiple object words respectively, then chooses in object text and crucial that highest object of Word similarity Word, so that it may ten final similarity values are obtained, by ten similarity values and vector similarity together as logistic regression The parameter of model.
Above-mentioned text information matching degree detection method is believed by obtaining object text information and its corresponding referenced text Breath;The object text information is converted into the first hidden feature vector, and the referenced text information is converted to second Hidden feature vector;The vector similarity between the first hidden feature vector and the second hidden feature vector is calculated, The implicit semantic feature between object text information and referenced text information can effectively be extracted and matched;According to described right As text information and preset keyword acquisition Logic Regression Models, the vector similarity is inputted into the logistic regression mould Type obtains the matching degree of object text information between the object text information and the referenced text information, by by object Vector similarity input and object text information pair between implicit semantic feature between text information and referenced text information The Logic Regression Models answered can effectively improve the accuracy of text information matching degree detection.
It should be understood that although each step in the flow chart of Fig. 2-3 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-3 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.
In one of the embodiments, as shown in figure 4, providing a kind of text information matching degree detection device, device packet It includes:
Text information obtains module 401, for obtaining object text information and its corresponding referenced text information;
Text information conversion module 402, for the object text information to be converted to the first hidden feature vector, and The referenced text information is converted into the second hidden feature vector;
Vector similarity obtains module 403, for calculating the first hidden feature vector and second hidden feature Vector similarity between vector;
Matching degree detection module 404 is returned for obtaining logic according to the object text information and preset keyword Return model, the vector similarity is inputted into the Logic Regression Models, obtains the object text information and described with reference to text The matching degree of object text information between this information.
Specific restriction about text information matching degree detection device may refer to above for text information matching degree The restriction of detection method, details are not described herein.Modules in above-mentioned text information matching degree detection device can whole or portion Divide and is realized by software, hardware and combinations thereof.Above-mentioned each module can be embedded in the form of hardware or independently of computer equipment In processor in, can also be stored in a software form in the memory in computer equipment, in order to processor calling hold The corresponding operation of the above modules of row.
In one embodiment, a kind of server is provided, which can be realized using computer equipment, in Portion's structure chart can be as shown in Figure 5.The computer equipment includes that the processor, memory, network connected by device bus connects Mouth and database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The storage of the computer equipment Device includes non-volatile memory medium, built-in storage.The non-volatile memory medium be stored with operating device, computer program and Database.The built-in storage provides environment for the operation of operating device and computer program in non-volatile memory medium.It should The database of computer equipment is used to store text information matching degree and detects the data being related to.The network interface of the computer equipment For being communicated with external terminal by network connection.To realize a kind of text information when the computer program is executed by processor Matching degree detection method.
It will be understood by those skilled in the art that structure shown in Fig. 5, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, is stored in memory Computer program, which performs the steps of when executing computer program obtains object text information and its corresponding ginseng Examine text information;The object text information is converted into the first hidden feature vector, and the referenced text information is turned It is changed to the second hidden feature vector;Calculate the vector between the first hidden feature vector and the second hidden feature vector Similarity;Logic Regression Models are obtained according to the object text information and preset keyword, by the vector similarity The Logic Regression Models are inputted, object text information between the object text information and the referenced text information is obtained Matching degree.
The acquisition target is from coding structure when processor executes computer program in one of the embodiments, comprising: The object text information is inputted into default learning algorithm, obtains object input vector;Object input vector input is pre- If extracting default first hidden feature corresponding with the object input vector from coding structure from coding structure Vector.
When processor executes computer program in one of the embodiments, the referenced text information include with it is described right As the corresponding question text information of text information and received text information;The second hidden feature vector includes the implicit spy of problem Levy vector sum standard hidden feature vector;It is described that the referenced text information is converted into the second hidden feature vector, comprising: will Described problem text information inputs default learning algorithm, obtains problem input vector;The input of described problem input vector is default From coding structure, extract the default described problem hidden feature corresponding with described problem input vector from coding structure to Amount;The received text information input is preset into learning algorithm, obtains standard input vector;The standard input vector is inputted It is described default from coding structure, it is hidden to extract the default standard corresponding with the standard input vector from coding structure Containing feature vector.
Processor executes the acquisition object text information and its correspondence when computer program in one of the embodiments, Referenced text information the step of after, further includes: obtain associated with object text information training feature vector;Root According to the training feature vector, to prestore it is multiple be trained from coding structure, obtain multiple training from coding structure;It calculates Each training chooses the smallest training of information loss amount from coding structure, as default from the information loss amount of coding structure From coding structure.
It includes problem similarity that processor, which executes vector similarity when computer program, in one of the embodiments, With standard similarity;Calculating the first hidden feature vector is similar to the vector between the second hidden feature vector Degree, comprising: calculate the included angle cosine value between the first hidden feature vector sum described problem hidden feature vector, obtain institute State problem similarity;The included angle cosine value between standard hidden feature vector described in the first hidden feature vector sum is calculated, Obtain the standard similarity.
In one of the embodiments, processor execute when computer program it is described according to the object text information and Preset keyword obtains Logic Regression Models, comprising: obtains the key between predetermined keyword and the object text information Word similarity;The crucial Word similarity and the vector similarity are set as to the parameter of preset initial regression model, obtained The Logic Regression Models corresponding with the object text information.
The acquisition predetermined keyword and the object when processor executes computer program in one of the embodiments, Crucial Word similarity between text information, comprising: calculate the information value of each keyword in predetermined keyword library, choose information The keyword that value is greater than preset threshold is set as the predetermined keyword;It splits the object text information to obtain multiple objects Word calculates the similarity of the predetermined keyword and the object word;It chooses the maximum value in the similarity and is set as institute State crucial Word similarity.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program performs the steps of when being executed by processor obtains object text information and its corresponding referenced text information;By institute It states object text information and is converted to the first hidden feature vector, and the referenced text information is converted into the second hidden feature Vector;Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector;According to described Object text information and preset keyword obtain Logic Regression Models, and the vector similarity is inputted the logistic regression Model obtains the matching degree of object text information between the object text information and the referenced text information.
The acquisition target is wrapped from coding structure when computer program is executed by processor in one of the embodiments, It includes: the object text information being inputted into default learning algorithm, obtains object input vector;The object input vector is inputted It presets from coding structure, extracts default corresponding with the object input vector from the coding structure described first implicit spy Levy vector.
When computer program is executed by processor in one of the embodiments, the referenced text information include with it is described The corresponding question text information of object text information and received text information;The second hidden feature vector includes that problem is implicit Feature vector and standard hidden feature vector;It is described that the referenced text information is converted into the second hidden feature vector, comprising: Described problem text information is inputted into default learning algorithm, obtains problem input vector;The input of described problem input vector is pre- If extracting the default described problem hidden feature corresponding with described problem input vector from coding structure from coding structure Vector;The received text information input is preset into learning algorithm, obtains standard input vector;The standard input vector is defeated Enter described default from coding structure, the extraction default standard corresponding with the standard input vector from coding structure Hidden feature vector.
The acquisition object text information and its right when computer program is executed by processor in one of the embodiments, After the step of referenced text information answered, further includes: obtain training feature vector associated with the object text information; According to the training feature vector, to prestore it is multiple be trained from coding structure, obtain multiple training from coding structure;Meter Information loss amount of each training from coding structure is calculated, chooses the smallest training of information loss amount from coding structure, as pre- If from coding structure.
The vector similarity includes that problem is similar when computer program is executed by processor in one of the embodiments, Degree and standard similarity;The vector phase calculated between the first hidden feature vector and the second hidden feature vector Like degree, comprising: calculate the included angle cosine value between the first hidden feature vector sum described problem hidden feature vector, obtain Described problem similarity;Calculate the included angle cosine between standard hidden feature vector described in the first hidden feature vector sum Value, obtains the standard similarity.
When computer program is executed by processor in one of the embodiments, it is described according to the object text information with And preset keyword obtains Logic Regression Models, comprising: obtains the pass between predetermined keyword and the object text information Keyword similarity;The crucial Word similarity and the vector similarity are set as to the parameter of preset initial regression model, obtained To the Logic Regression Models corresponding with the object text information.
The acquisition predetermined keyword and described right when computer program is executed by processor in one of the embodiments, As the crucial Word similarity between text information, comprising: calculate the information value of each keyword in predetermined keyword library, choose letter The keyword that breath value is greater than preset threshold is set as the predetermined keyword;It is multiple right that the object text information is split to obtain As word, the similarity of the predetermined keyword and the object word is calculated;The maximum value chosen in the similarity is set as The key Word similarity.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Instruct relevant hardware to complete by computer program, computer program to can be stored in a non-volatile computer readable It takes in storage medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, this Shen Please provided by any reference used in each embodiment to memory, storage, database or other media, may each comprise Non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.
Only several embodiments of the present invention are expressed for above embodiments, and the description thereof is more specific and detailed, but can not Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art, In Under the premise of not departing from present inventive concept, various modifications and improvements can be made, and these are all within the scope of protection of the present invention. Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims (10)

1. a kind of text information matching degree detection method, which is characterized in that the described method includes:
Obtain object text information and its corresponding referenced text information;
The object text information is converted into the first hidden feature vector from coding structure according to default, and by the reference Text information is converted to the second hidden feature vector;Wherein, the first hidden feature vector is for representing the object text The characteristic information of information;The second hidden feature vector is used to represent the characteristic information of the referenced text information;
Calculate the vector similarity between the first hidden feature vector and the second hidden feature vector;
Logic Regression Models are obtained according to the object text information and preset keyword, the vector similarity is inputted The Logic Regression Models obtain the matching of object text information between the object text information and the referenced text information Degree.
2. the method according to claim 1, wherein the basis it is default from coding structure by the object text Information is converted to the first hidden feature vector, comprising:
The object text information is inputted into default learning algorithm, obtains object input vector;
Object input vector input is described default from coding structure, extract it is described it is default from coding structure with it is described right As the corresponding first hidden feature vector of input vector.
3. the method according to claim 1, wherein the referenced text information includes and the object text envelope Cease corresponding question text information and received text information;The second hidden feature vector includes problem hidden feature vector sum Standard hidden feature vector;It is described that the referenced text information is converted into the second hidden feature vector, comprising:
Described problem text information is inputted into default learning algorithm, obtains problem input vector;
The input of described problem input vector is default from coding structure, it extracts described default defeated with described problem from coding structure The corresponding described problem hidden feature vector of incoming vector;
The received text information input is preset into learning algorithm, obtains standard input vector;
Standard input vector input is described default from coding structure, extract it is described it is default from coding structure with the mark The corresponding standard hidden feature vector of quasi- input vector.
4. the method according to claim 1, wherein the acquisition object text information and its corresponding reference text After the step of this information, further includes:
Obtain training feature vector associated with the object text information;
According to the training feature vector, to prestore it is multiple be trained from coding structure, obtain multiple training from encoding knot Structure;
It calculates information loss amount of each training from coding structure, chooses the smallest training of information loss amount from coding structure, As default from coding structure.
5. according to the method described in claim 3, it is characterized in that, the vector similarity includes problem similarity and standard phase Like degree;The vector similarity calculated between the first hidden feature vector and the second hidden feature vector, comprising:
The included angle cosine value between the first hidden feature vector sum described problem hidden feature vector is calculated, described ask is obtained Inscribe similarity;
The included angle cosine value between standard hidden feature vector described in the first hidden feature vector sum is calculated, the mark is obtained Quasi- similarity.
6. the method according to claim 1, wherein described according to the object text information and preset pass Keyword obtains Logic Regression Models, comprising:
Obtain the crucial Word similarity between predetermined keyword and the object text information;
The crucial Word similarity and the vector similarity are set as to the parameter of preset initial regression model, obtain with it is described The corresponding Logic Regression Models of object text information.
7. according to the method described in claim 6, it is characterized in that, the acquisition predetermined keyword and the object text information Between crucial Word similarity, comprising:
The information value of each keyword in predetermined keyword library is calculated, the keyword for choosing information value greater than preset threshold is set as The predetermined keyword;
The object text information is split to obtain multiple object words, calculates the predetermined keyword and the object word Similarity;
It chooses the maximum value in the similarity and is set as the crucial Word similarity.
8. a kind of text information matching degree detection device, which is characterized in that described device includes:
Text information obtains module, for obtaining object text information and its corresponding referenced text information;
Text information conversion module, for the object text information to be converted to the first implicit spy from coding structure according to default Vector is levied, and the referenced text information is converted into the second hidden feature vector;Wherein, the first hidden feature vector For representing the characteristic information of the object text information;The second hidden feature vector is for representing the referenced text letter The characteristic information of breath;
Vector similarity obtains module, for calculating between the first hidden feature vector and the second hidden feature vector Vector similarity;
Matching degree detection module, for obtaining Logic Regression Models according to the object text information and preset keyword, The vector similarity is inputted into the Logic Regression Models, obtain the object text information and the referenced text information it Between object text information matching degree.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201910569471.7A 2019-06-27 2019-06-27 Text information matching degree detection method, device, computer equipment and storage medium Pending CN110413730A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910569471.7A CN110413730A (en) 2019-06-27 2019-06-27 Text information matching degree detection method, device, computer equipment and storage medium
PCT/CN2019/103650 WO2020258506A1 (en) 2019-06-27 2019-08-30 Text information matching degree detection method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910569471.7A CN110413730A (en) 2019-06-27 2019-06-27 Text information matching degree detection method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110413730A true CN110413730A (en) 2019-11-05

Family

ID=68359982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910569471.7A Pending CN110413730A (en) 2019-06-27 2019-06-27 Text information matching degree detection method, device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN110413730A (en)
WO (1) WO2020258506A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111180086A (en) * 2019-12-12 2020-05-19 平安医疗健康管理股份有限公司 Data matching method and device, computer equipment and storage medium
CN111191457A (en) * 2019-12-16 2020-05-22 浙江大搜车软件技术有限公司 Natural language semantic recognition method and device, computer equipment and storage medium
CN111401076A (en) * 2020-04-09 2020-07-10 支付宝(杭州)信息技术有限公司 Text similarity determination method and device and electronic equipment
CN111639161A (en) * 2020-05-29 2020-09-08 中国工商银行股份有限公司 System information processing method, apparatus, computer system and medium
CN112597281A (en) * 2020-12-28 2021-04-02 中国农业银行股份有限公司 Information acquisition method and device
CN112749252A (en) * 2020-07-14 2021-05-04 腾讯科技(深圳)有限公司 Text matching method based on artificial intelligence and related device
CN112989784A (en) * 2021-03-04 2021-06-18 广州汇才创智科技有限公司 Text automatic scoring method and device based on twin neural network and electronic equipment
WO2021139424A1 (en) * 2020-05-14 2021-07-15 平安科技(深圳)有限公司 Text content quality evaluation method, apparatus and device, and storage medium
CN113157871A (en) * 2021-05-27 2021-07-23 东莞心启航联贸网络科技有限公司 News public opinion text processing method, server and medium applying artificial intelligence
CN113672694A (en) * 2020-05-13 2021-11-19 武汉Tcl集团工业研究院有限公司 Text processing method, terminal and storage medium
CN113836942A (en) * 2021-02-08 2021-12-24 宏龙科技(杭州)有限公司 Text matching method based on hidden keywords
CN113989859A (en) * 2021-12-28 2022-01-28 江苏苏宁银行股份有限公司 Fingerprint similarity identification method and device for anti-flashing equipment
CN116188091A (en) * 2023-05-04 2023-05-30 品茗科技股份有限公司 Method, device, equipment and medium for automatic matching unit price reference of cost list

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343987B (en) * 2021-06-30 2023-08-22 北京奇艺世纪科技有限公司 Text detection processing method and device, electronic equipment and storage medium
CN114003305B (en) * 2021-10-22 2024-03-15 济南浪潮数据技术有限公司 Device similarity calculation method, computer device, and storage medium
CN117195860B (en) * 2023-11-07 2024-03-26 品茗科技股份有限公司 Intelligent inspection method, system, electronic equipment and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920654A (en) * 2018-06-29 2018-11-30 泰康保险集团股份有限公司 A kind of matched method and apparatus of question and answer text semantic
CN109189931A (en) * 2018-09-05 2019-01-11 腾讯科技(深圳)有限公司 A kind of screening technique and device of object statement
CN109766428A (en) * 2019-02-02 2019-05-17 中国银行股份有限公司 Data query method and apparatus, data processing method
CN109829299A (en) * 2018-11-29 2019-05-31 电子科技大学 A kind of unknown attack recognition methods based on depth self-encoding encoder
CN109871531A (en) * 2019-01-04 2019-06-11 平安科技(深圳)有限公司 Hidden feature extracting method, device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918663B (en) * 2019-03-04 2021-01-08 腾讯科技(深圳)有限公司 Semantic matching method, device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920654A (en) * 2018-06-29 2018-11-30 泰康保险集团股份有限公司 A kind of matched method and apparatus of question and answer text semantic
CN109189931A (en) * 2018-09-05 2019-01-11 腾讯科技(深圳)有限公司 A kind of screening technique and device of object statement
CN109829299A (en) * 2018-11-29 2019-05-31 电子科技大学 A kind of unknown attack recognition methods based on depth self-encoding encoder
CN109871531A (en) * 2019-01-04 2019-06-11 平安科技(深圳)有限公司 Hidden feature extracting method, device, computer equipment and storage medium
CN109766428A (en) * 2019-02-02 2019-05-17 中国银行股份有限公司 Data query method and apparatus, data processing method

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111180086A (en) * 2019-12-12 2020-05-19 平安医疗健康管理股份有限公司 Data matching method and device, computer equipment and storage medium
CN111180086B (en) * 2019-12-12 2023-04-25 平安医疗健康管理股份有限公司 Data matching method, device, computer equipment and storage medium
CN111191457A (en) * 2019-12-16 2020-05-22 浙江大搜车软件技术有限公司 Natural language semantic recognition method and device, computer equipment and storage medium
CN111191457B (en) * 2019-12-16 2023-09-15 浙江大搜车软件技术有限公司 Natural language semantic recognition method, device, computer equipment and storage medium
CN111401076A (en) * 2020-04-09 2020-07-10 支付宝(杭州)信息技术有限公司 Text similarity determination method and device and electronic equipment
CN111401076B (en) * 2020-04-09 2023-04-25 支付宝(杭州)信息技术有限公司 Text similarity determination method and device and electronic equipment
CN113672694A (en) * 2020-05-13 2021-11-19 武汉Tcl集团工业研究院有限公司 Text processing method, terminal and storage medium
WO2021139424A1 (en) * 2020-05-14 2021-07-15 平安科技(深圳)有限公司 Text content quality evaluation method, apparatus and device, and storage medium
CN111639161A (en) * 2020-05-29 2020-09-08 中国工商银行股份有限公司 System information processing method, apparatus, computer system and medium
CN112749252A (en) * 2020-07-14 2021-05-04 腾讯科技(深圳)有限公司 Text matching method based on artificial intelligence and related device
CN112749252B (en) * 2020-07-14 2023-11-03 腾讯科技(深圳)有限公司 Text matching method and related device based on artificial intelligence
CN112597281A (en) * 2020-12-28 2021-04-02 中国农业银行股份有限公司 Information acquisition method and device
CN113836942A (en) * 2021-02-08 2021-12-24 宏龙科技(杭州)有限公司 Text matching method based on hidden keywords
CN113836942B (en) * 2021-02-08 2022-09-20 宏龙科技(杭州)有限公司 Text matching method based on hidden keywords
CN112989784A (en) * 2021-03-04 2021-06-18 广州汇才创智科技有限公司 Text automatic scoring method and device based on twin neural network and electronic equipment
CN113157871B (en) * 2021-05-27 2021-12-21 宿迁硅基智能科技有限公司 News public opinion text processing method, server and medium applying artificial intelligence
CN113157871A (en) * 2021-05-27 2021-07-23 东莞心启航联贸网络科技有限公司 News public opinion text processing method, server and medium applying artificial intelligence
CN113989859A (en) * 2021-12-28 2022-01-28 江苏苏宁银行股份有限公司 Fingerprint similarity identification method and device for anti-flashing equipment
CN113989859B (en) * 2021-12-28 2022-05-06 江苏苏宁银行股份有限公司 Fingerprint similarity identification method and device for anti-flashing equipment
CN116188091A (en) * 2023-05-04 2023-05-30 品茗科技股份有限公司 Method, device, equipment and medium for automatic matching unit price reference of cost list

Also Published As

Publication number Publication date
WO2020258506A1 (en) 2020-12-30

Similar Documents

Publication Publication Date Title
CN110413730A (en) Text information matching degree detection method, device, computer equipment and storage medium
CN111695352A (en) Grading method and device based on semantic analysis, terminal equipment and storage medium
CN110853626B (en) Bidirectional attention neural network-based dialogue understanding method, device and equipment
CN109800407A (en) Intension recognizing method, device, computer equipment and storage medium
CN111247581B (en) Multi-language text voice synthesizing method, device, equipment and storage medium
CN111460807A (en) Sequence labeling method and device, computer equipment and storage medium
CN108959257A (en) A kind of natural language analytic method, device, server and storage medium
CN112633003A (en) Address recognition method and device, computer equipment and storage medium
CN110442677A (en) Text matches degree detection method, device, computer equipment and readable storage medium storing program for executing
KR102143745B1 (en) Method and system for error correction of korean using vector based on syllable
CN113408574B (en) License plate classification method, license plate classification device and computer readable storage medium
CN112257437A (en) Voice recognition error correction method and device, electronic equipment and storage medium
CN112528637A (en) Text processing model training method and device, computer equipment and storage medium
CN112632248A (en) Question answering method, device, computer equipment and storage medium
CN115146068A (en) Method, device and equipment for extracting relation triples and storage medium
Liu et al. Multimodal emotion recognition based on cascaded multichannel and hierarchical fusion
CN115064154A (en) Method and device for generating mixed language voice recognition model
CN113051384A (en) User portrait extraction method based on conversation and related device
CN113887169A (en) Text processing method, electronic device, computer storage medium, and program product
CN109377203A (en) Medical settlement data processing method, device, computer equipment and storage medium
CN116593980B (en) Radar target recognition model training method, radar target recognition method and device
CN105975643B (en) A kind of realtime graphic search method based on text index
CN112786003A (en) Speech synthesis model training method and device, terminal equipment and storage medium
CN116844573A (en) Speech emotion recognition method, device, equipment and medium based on artificial intelligence
CN115994220A (en) Contact net text data defect identification method and device based on semantic mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination