CN111125308B - Lightweight text fuzzy search method supporting semantic association - Google Patents

Lightweight text fuzzy search method supporting semantic association Download PDF

Info

Publication number
CN111125308B
CN111125308B CN201911331527.1A CN201911331527A CN111125308B CN 111125308 B CN111125308 B CN 111125308B CN 201911331527 A CN201911331527 A CN 201911331527A CN 111125308 B CN111125308 B CN 111125308B
Authority
CN
China
Prior art keywords
fuzzy search
text
semantic association
lightweight
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911331527.1A
Other languages
Chinese (zh)
Other versions
CN111125308A (en
Inventor
裴正奇
黄梓忱
段必超
段朦丽
朱斌斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Qianhai Heidun Technology Co ltd
Original Assignee
Shenzhen Qianhai Heidun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Qianhai Heidun Technology Co ltd filed Critical Shenzhen Qianhai Heidun Technology Co ltd
Priority to CN201911331527.1A priority Critical patent/CN111125308B/en
Publication of CN111125308A publication Critical patent/CN111125308A/en
Application granted granted Critical
Publication of CN111125308B publication Critical patent/CN111125308B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a lightweight text fuzzy search method supporting semantic association, which comprises the following steps. The ambiguity is higher, the invention improves the traditional sentence retrieval algorithm, can retrieve out the sentences which are completely consistent with the target sentences and have high similarity, and can flexibly adjust the approximate values of the target sentences; the operation speed is fast: the traditional violent enumeration algorithm is abandoned, and the methods of semantic maps, convolution, dynamic planning and the like are used, so that the searching process is optimized, and the searching speed is greatly improved; the system is light: the system size is reduced, internal and external optimization is performed aiming at light-weight users and use scenes, the whole calculation process is optimized, and the memory burden is reduced. The invention also provides a set of association modes without field operation, and a user can call the association module in the fuzzy search without occupying local computing power; the system is flexible, and is easy for users to flexibly call different applications: the whole algorithm module is subjected to interface packaging.

Description

Lightweight text fuzzy search method supporting semantic association
Technical Field
The invention relates to the relevant field of text fuzzy search, in particular to a lightweight text fuzzy search method supporting semantic association.
Background
Fuzzy search of texts is applied in many places, especially nowadays, networks are increasingly developed, and the amount of text information generated on the networks is also explosively increased. Accordingly, harmful information and information causing instability are increasingly abused, and therefore, many contents need to be inspected to be displayed on a public network platform. In the initial stage of network examination, most of the network examination is carried out by manual examination, so that the efficiency is low and is more negligible compared with the speed of network text generation. Many scholars and companies are therefore more concerned with fuzzy search of text, i.e. fuzzy finding of a given keyword or key sentence in a large amount of text information, i.e. fuzzy matching. Firstly, the text is matched mainly by algorithms such as BF (BruteForce), RK (Robin-Karp), KMP (Knuth-Morris-Pratt), BM (BoyerMoore) and the like, namely, the matching is successful only if a character string which is completely the same as a keyword is found in the text information, and the semantic information is not considered in the mode, so that the task of fuzzy matching cannot be completed. The main methods for fuzzy matching of texts, namely character string fuzzy matching, include a bit vector method, a filtering method, and the like, and a large amount of space is required when the bit vector method is applied, which is a problem for a microcomputer with a small memory, such as an embedded system.
The existing text fuzzy search has the following defects:
1. most of the current text fuzzy search does not well embody real fuzzy search, and simply speaking, the fuzzy degree is low, and semantic association such as synonyms and associated words of search keywords cannot be well supported, so that the synonyms of the keywords can be filtered out, and in practical application, the synonyms may need to be retained, thereby causing mis-filtering and reducing recall ratio. When a keyword or a key sentence is searched in a relatively long text, the text is processed in a relatively violent mode, so that the efficiency is relatively low, namely, the weight is not light enough;
2. most of the current text fuzzy search does not well solve two main problems of character string fuzzy matching: the method has the advantages that the method has the space problem and the time problem, a large amount of calculation and storage are needed when texts are processed, and the actual online requirements cannot be met frequently in time complexity and space complexity by the existing fuzzy matching algorithm;
3. most of the current text fuzzy search cannot perform fuzzy search on sentence-level feature capture, and in short, for texts needing to be searched, if no texts needing to be searched exist in texts to be searched, the search result is null. But there may be texts with similar meanings to the texts needing to be searched, and if the situation is met in practical application, the results of the search are not expected to be empty, and the texts with similar meanings are used as the returned results.
Therefore, a lightweight text fuzzy search method supporting semantic association is provided.
Disclosure of Invention
The invention aims to provide a lightweight text fuzzy search method supporting semantic association to solve the problems in the background technology.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method of lightweight text fuzzy search supporting semantic association, the search method comprising the steps of:
s1, modeling a technical scene, wherein a text fuzzy search problem can be converted into a problem of inquiring a short text in a long text, and the long text and the short text are a series of character sequences;
s2, in order to ensure the light weight of the operation, the semantic association graph is built in advance and stored for direct calling, and the operation is not performed on site;
s3, a fuzzy search scheme is given, long texts S = { S1, S2, S3, \8230; sn }, and search requests Q = { Q1, Q2, Q3, … qm } are given;
s4, automatically dividing the search task, namely automatically dividing the long text S with larger space, segmenting the long text S according to a specific terminator, and then performing the operation 3 segment by segment;
s5, performing internal acceleration processing on each link of the algorithm scheme in the S3 by internal acceleration and multithread acceleration;
s6, interface packaging, which is convenient for flexible application of the text fuzzy search module and can be packaged in the form of an interface product, wherein the input parameter format is as follows: bluE (S, Q, autoSplit, isImagine, stop _ words), wherein the autoSplit and the isImagine are both values of Boolean type, the autoSplit determines whether an operation mechanism of automatic task division is adopted, the isImagine determines whether an association mode is started, and the stop _ words are self-defined terminators in the autoSplit mode.
Preferably, the characters in S1 include kanji, english alphabets, numerals, and special characters.
Preferably, the fuzzy search scheme in S3 depends on whether the user turns on the semantic association function, if not, the fuzzy search will be based on characters, and the constituent units of S and Q are directly characters; if the semantic association function is started, word segmentation processing needs to be performed on S and Q firstly.
Preferably, the algorithm for fuzzy search in S3 includes a multi-level convolution character density weighted matching algorithm and a near-diagonal common subsequence matching algorithm.
Preferably, before performing the operation of S3, a "first glance" determination may be performed, and the idea is as follows: blu (S, Q) = = trueifelen (set (Q) & set (S)) > len (set (Q)) × 0.5.
Preferably, in the convolution operation of the multi-level convolution character density weighted matching algorithm, whether S _ conv has enough non-zero value units or not can be judged in advance, otherwise, the convolution operation is not executed on S _ conv.
Preferably, the convolution summation operation of the multi-level convolution character density weighted matching algorithm is assisted by an external tool such as numpy.
Compared with the prior art, the invention has the following beneficial effects:
1. the fuzzy degree is higher, the traditional sentence retrieval algorithm is improved, sentences which are completely consistent with the target sentences and have high similarity can be retrieved, and the approximate values of the sentences and the target sentences can be flexibly adjusted;
2. the operation speed is fast: the traditional violent enumeration algorithm is abandoned, and the methods of semantic maps, convolution, dynamic planning and the like are used, so that the searching process is optimized, and the searching speed is greatly improved;
3. the system is light: the system size is reduced, internal and external optimization is performed aiming at light-weight users and use scenes, the whole calculation process is optimized, and the memory burden is reduced. The invention also provides a set of association modes without field operation, and a user can call the association module in the fuzzy search without occupying local computing power;
4. the system is flexible, and is easy for users to flexibly call different applications: the whole algorithm module is subjected to interface packaging, so that a user can conveniently and directly call part of modules to solve the actual requirements of the user;
5. the method supports semantic association, and supports fuzzy matching of synonyms, similar words and associated word levels: compared with the traditional semantic retrieval method, the invention provides a fuzzy retrieval method which is more suitable for life use, and supports intelligent matching of the specific meaning of the words in the target text in the text with corresponding synonyms, near-synonyms and associated words;
6. text positioning: the fuzzy retrieval method can sort the found texts with the target text from high to low according to the similarity, and give the positions of the similar text segments and the matching degree of the similar text segments with the target text.
Drawings
FIG. 1 is a table cell representation of a lightweight text fuzzy search method supporting semantic association, where S _ conv for Si is T [: i-k// 2;
FIG. 2 is a distribution diagram of a convolution score distribution situation in a lightweight text fuzzy search method supporting semantic association proposed by the present invention;
FIG. 3 is a diagram of a running result of S1+ Q1 in an embodiment of the lightweight text fuzzy search method supporting semantic association proposed by the present invention;
fig. 4 is a diagram of operation results of S1+ Q2 in an embodiment of the method for lightweight text fuzzy search supporting semantic association according to the present invention;
fig. 5 is a running result diagram of a second S2+ Q3 embodiment in the lightweight text fuzzy search method supporting semantic association according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
Referring to fig. 1-2, the invention also provides a lightweight text fuzzy search method supporting semantic association, and the search method comprises the following steps:
s1, modeling a technical scene, wherein a text fuzzy search problem can be converted into a problem of inquiring a short text in a long text, and the long text and the short text are a series of character sequences.
S2, in order to ensure the light weight of the operation, the semantic association graph is built in advance and stored for direct calling, and the operation is not performed on site.
S3, fuzzy search scheme, given long text S = { S1, S2, S3, \8230sn }, given search request Q = { Q1, Q2, Q3, \8230qm }, which can be performed by a commonly used word segmenter in the industry, such as a chinese word segmenter: jieba, pkuSeg, and the like. By means of the word segmentation device, the character string 'I family has a beautiful small flowered cat' can be converted into { s1, s2, s3, \823030; sn } = { 'I family', 'having', 'one', "beautiful", "small", "spotted cat" }, if without the help of word segmenter, { s1, s2, s3, \8230; sn } = { "me", "home", "have", "one", "only", \8230 {, }. The converted text structure is equivalent to the following algorithm operation, whether with or without the help of a word segmenter. Generally, a plurality of different fuzzy search methods can be flexibly set, for example, by using a common LCS (longest common subsequence) algorithm, the similarity of the sequence between two character strings can be better obtained.
And S4, automatically dividing the search task, namely automatically dividing the long text S with larger space, segmenting the long text S according to specific terminators such as periods, questions, exclamation marks and the like, and then carrying out the operation 3 segment by segment.
And S5, internal acceleration and multithread acceleration are performed, and internal acceleration processing is performed on each link of the algorithm scheme in the S3, so that the time consumption of summation operation can be greatly reduced. There are many acceleration methods such as this, and the description thereof is omitted here. Regarding the multithreading acceleration, mainly aiming at the autoSplit method of 4, the character string paragraphs which are automatically divided are respectively sent to different computing units (servers) to be subjected to multithreading computation, so that the technical efficiency can be greatly improved. The currently used multithreading technology is mainly implemented by using third-party library Multiprocesses of Python, firstly, a segmented sentence takes the format of an array S = [ Si, \8230; ] as one of the inputs, and the number of used threads or processes is specified, and then, a function map or similar functions in the library are used for automatically calling currently idle thread units. And for each clause, completing a matching process from association to fuzzy search by using the model. The technical point here is how to parallelize the conventional iterative process of associating each clause and then searching for a fuzzy search, namely, the process of associating and searching for a plurality of clauses [ Si.., sm ] simultaneously by using a callable resource unit.
S6, interface packaging, which is convenient for flexible application of the text fuzzy search module and can be packaged in the form of an interface product, wherein the input parameter format is as follows: the method comprises the following steps of bluE (S, Q, autoSplit, isImagine, stop _ words), wherein the autoSplit and the isImagine are both Boolean type values, the autoSplit determines whether an operation mechanism of automatic task division is adopted, the isImagine determines whether an association mode is started, and the stop _ words are self-defined terminators in the autoSplit mode.
Wherein, the characters in S1 comprise Chinese characters, english letters, numbers and special characters.
The fuzzy search scheme in the S3 depends on whether the user starts the semantic association function, if the user does not start the semantic association function, the fuzzy search is based on characters, and the constituent units of the S and the Q are directly characters; if the semantic association function is started, the word segmentation processing needs to be carried out on S and Q.
The fuzzy search algorithm in S3 comprises a multi-level convolution character density weighted matching algorithm and a near-diagonal common subsequence matching algorithm.
The multilevel convolution character density weighted matching algorithm emphasizes that higher ambiguity is supported, and a weighted matching table { Tij } for long text S = { S1, S2, S3, \8230; sn } and search request Q = { Q1, Q2, Q3, \8230; qm } is built by means of a semantic association map:
S1 S2 S3 S4 S5 ... Sn
Q1 T11 T12
Q2 T21 T22
Q3
Q4 ...
...
Qm Tnm
where Tij represents the score of the corresponding state in the semantic association map with Sj as an index word and Qi as a subword, generally, it can be assumed that "synonyms" is 1 score, "near-synonyms" is 0.75 score, and "related words" is 0.5 score, that is, if Qi belongs to the near-synonyms of Sj, tij =0.75, and if Qi does not belong to any group of Sj, tij =0. After the weighted matching table is built, the step-over stride k needs to be set first, the larger k is, the larger tolerance for the step-over degree between characters is, and the matching effect is about fuzzy, and generally, k = 5. Then zero padding (padding) is performed on the weighted matching table { Tij } to make it uniformly surrounded by zero-valued cells, as shown below, the cells in the padded part represent that they have been given zero values:
Figure BDA0002329731710000081
Figure BDA0002329731710000091
extracting an area S _ conv to be convolved for S, for example, S _ conv for Si is T [: i-k// 2;
the convolutional layer for Q is set for k, which may be referred to as Q _ conv, and is a matrix of k × k, whose purpose is to score higher for the case with diagonal regions having higher assignments, so Q _ conv may be set as follows:
1 0.75 0.25 0.05 0
0.75 1 0.75 0.25 0.05
0.25 0.75 1 0.75 0.25
0.05 0.25 0.75 1 0.75
0 0.05 0.25 0.75 1
with more sophisticated deep learning model training conditions, its own Q _ conv can be configured for any Si, i.e. different constituent elements in S should have different convolutional layers. In addition, si can be provided with Q _ conv incorporating semantics by means of a syntactic level language model (e.g., elmo, BERT), for example, for the same word "love", which is different between the two cases where S = "i love beijing tiananmen" and S = "i love my wife". Then, each S _ conv is convolved vertically with Q _ conv, and for Tij, the result of the convolution is:
Figure BDA0002329731710000101
wherein, x 'and y' are coordinates of the position corresponding to S _ conv [ x ] [ y ] on the upper surface of Q _ conv, and are simplified to be convenient for expression as x 'and y'. Therefore, convolution scores of Si and Qj can be obtained, the score of Qj with the highest convolution score is reserved for each Si and is used as the approximate convolution score of Si for the whole Q, and then the convolution score distribution situation of S can be obtained as shown in the graph 2. According to fig. 2, the matched character strings, such as the corresponding character strings [ S7, S8, S9, S10] in the above frame, can be extracted by various extraction methods, and the basic idea is as follows: extracting the continuous character string (for example, the length of the valley character does not exceed 2) of which the convolution score exceeds a certain threshold (for example, 0.5), and summing and normalizing the convolution scores corresponding to the character string. And outputting the position corresponding to the character string, and taking the convolution fraction after summing and normalization as the similarity of the character string and Q.
The critical angle public subsequence matching algorithm is more efficient in side emphasis operation, and by means of a weighting matching table { Tij } of a multilevel convolution character density weighting matching algorithm, the specific scheme is a variant of LCS, namely, aiming at any Tij, taking Tij as a starting point, searching a cell Txy with the maximum L value or a cell with the first L value exceeding a certain threshold value (generally set to 0.25) in a matrix T [ 0. The cell found serves as the parent cell for Tij and once found, the query is terminated. The calculation method of the L value comprises the following steps:
Figure BDA0002329731710000111
the cell Tij stores its own Tij value, its parent cell, and its corresponding L value, and obtains its own weighted Tij value, which is called Yij. Using superscripts to describe the relationship between this cell and the parent cell, i.e., the parent cell of Tij ^ (t) is written as Tij ^ (t-1), we can then get the expression for YIj:
Figure BDA0002329731710000112
example one
Long text S1=' great family! Today i want to introduce my father. My father is a teacher, son, etc. A pair of bright eyes and black and beautiful hair also have a few silvery threads, which appear to be very old but very vigorous! '.
For short text Q1= 'double eyes' and Q2= 'eyes great'
Calling the packaged interface:
bluE(S,Q,autoSplit,isImagine,stop_words)
search for short text Q in long text S: result1= bluE (S = S1, Q = Q1), result2= bluE (S = S1, Q = Q2)
The returned results are respectively:
{ 'match _ str'; 'A bright large eye'; 'position': [33, 43], 'similarity' 0.4238}
{ 'match _ str': the large eye of, 'position': [39, 44], 'similarity' 0.3797}
The convolution fraction distribution conditions of S1 corresponding to two conditions of Q1 and Q2 are respectively (note: at this time, convolution operation is screened in advance, and S _ conv with less non-zero value units is not operated, so that only part of Si possesses convolution fraction):
s1+ Q1, the single-thread macbook 7 takes 0.002 seconds to run, and the running result is shown in figure 3.
S1+ Q2, the single-thread macbook 7 takes 0.0015 seconds to run, and the running result is shown in FIG. 4.
Example two
For the sample using autoSplit:
the long text S2=' john carl FriedrichGauss (johann carl FriedrichGauss) is a german mathematician who has made major advances in many fields of numerical theory, algebra, statistics, analysis, differential geometry, geodetics, geophysics, mechanics, electrostatics, astronomy, matrix theory and optics. Gaussians have indicated that geometric mapping of regular trilaterals, regular quadrilaterals, regular pentagons, regular pentadecagons and regular polygons with twice as many sides as the above can be achieved with compasses and straightedges, but since then no much progress has been made with respect to this problem. Gaussians provide a criterion for determining whether a given number of edges of a regular polygon can be geometrically mapped on the basis of number theory. For example, a regular heptadecagon inscribed in a circle can be formed by compasses and a ruler. Such a finding is also the first one after euclidean. In classical differential geometry, one often places curves and surfaces in three-dimensional euclidean space to handle. The description and discussion of many geometric properties of curves and surfaces often depend on how they are embedded in large spaces. In fact, however, the important properties of many geometric objects are intrinsic in nature, i.e., independent of the way they are embedded in large spaces. This is rarely noticed by geometry scientists of early years. Gaussians and riemann began to really realize this problem. Riemann in its famous lecture of geometry has formally re-discussed many concepts of geometry with an implication. '
Short text Q3= 'major progress',
calling the packaged interface:
bluE(S,Q,autoSplit,isImagine,stop_words)
search for short text Q in long text S: result1= blu (S = S2, Q = Q3, autoSplit = True)
The returned results are respectively:
{
1 { ' match _ str ': major progression ', ' position ': [110, 114], 'precision': 0.36},
2 { ' match _ str ': large progression ', ' position ': [77, 80], 'precision': 0.26}
}
The corresponding convolution score distribution is as follows:
s2+ Q3, the single-threaded macbook 7 takes 0.0027 seconds to run, and the result is shown in FIG. 5.
The invention has higher ambiguity, improves the traditional sentence retrieval algorithm, can retrieve the sentences which are completely consistent with the target sentences and have high similarity, and can flexibly adjust the approximate values of the target sentences. The method has the advantages that the operation speed is high, the traditional violent enumeration algorithm is abandoned, the methods of semantic mapping, convolution, dynamic programming and the like are used, the search process is optimized, and the search speed is greatly improved. The system is light, the size of the system is reduced, internal and external optimization is performed aiming at light-weight users and use scenes, the whole calculation process is optimized, and the memory burden is reduced. The invention also provides a set of association modes without field operation, and a user can call the association module in the fuzzy search without occupying local computing power. The system is flexible, and a user can flexibly call different applications easily: the whole algorithm module is subjected to interface packaging, and a user can conveniently and directly call partial modules to solve the actual requirements of the user. The method supports semantic association, and supports fuzzy matching of synonyms, similar words and associated word levels: compared with the traditional semantic retrieval method, the invention provides a fuzzy retrieval method which is more suitable for life use, and supports intelligent matching of the specific meaning of the words in the target text with the corresponding synonyms, near-synonyms and associated words. And (4) positioning the text, wherein the fuzzy retrieval method can sort the found text and the target text from high to low according to the similarity, and gives the position of the similar text segment and the matching degree of the similar text segment and the target text.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (7)

1. A method for supporting semantic association lightweight text fuzzy search is characterized in that the search method comprises the following steps:
s1, modeling a technical scene, wherein a text fuzzy search problem can be converted into a problem of inquiring a short text in a long text, and the long text and the short text are a series of character sequences;
s2, in order to ensure the light weight of the operation, the semantic association map is built in advance and stored for direct calling, and the operation is not carried out on site;
s3, a fuzzy search scheme is given, long texts S = { S1, S2, S3, \8230; sn }, and search requests Q = { Q1, Q2, Q3, … qm } are given;
s4, automatically dividing the search task, automatically dividing the long text S with larger space, segmenting the long text S according to a specific terminator, and then performing 3 operations segment by segment;
s5, performing internal acceleration processing on each link of the algorithm scheme in the S3;
s6, interface packaging, which is convenient for flexible application of the text fuzzy search module and can be packaged in the form of an interface product, wherein the input parameter format is as follows: the method comprises the following steps of bluE (S, Q, autoSplit, isImagine, stop _ words), wherein the autoSplit and the isImagine are both Boolean type values, the autoSplit determines whether an operation mechanism of automatic task division is adopted, the isImagine determines whether an association mode is started, and the stop _ words are self-defined terminators in the autoSplit mode.
2. The method for lightweight fuzzy search supporting semantic association as recited in claim 1, wherein the characters in S1 comprise kanji, english alphabets, numerals and special characters.
3. The method for supporting lightweight fuzzy search of semantic association according to claim 1, wherein the fuzzy search scheme in S3 depends on whether the user turns on the semantic association function, if not, the fuzzy search will be based on characters, and the constituent units of S and Q are directly characters; if the semantic association function is started, word segmentation processing needs to be performed on S and Q firstly.
4. The method for lightweight fuzzy search supporting semantic association as claimed in claim 1, wherein said fuzzy search algorithm in S3 comprises a multi-level convolution character density weighted matching algorithm and a near-diagonal common subsequence matching algorithm.
5. The method for lightweight fuzzy search supporting semantic association according to claim 1, wherein a "first glance" decision can be made before performing the operation of S3, and the idea is as follows: blu (S, Q) = = trueifelen (set (Q) & set (S)) > len (set (Q)) × 0.5.
6. The method for lightweight fuzzy search supporting semantic association as recited in claim 4, wherein in the convolution operation of said multilevel convolution character density weighted matching algorithm, it can be determined in advance whether S _ conv has enough non-zero value units, otherwise, no convolution operation is performed on it.
7. The method for lightweight fuzzy search supporting semantic association as claimed in claim 4, wherein the convolution summation operation of the multilevel convolution character density weighted matching algorithm is assisted by an external tool such as numpy.
CN201911331527.1A 2019-12-21 2019-12-21 Lightweight text fuzzy search method supporting semantic association Active CN111125308B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911331527.1A CN111125308B (en) 2019-12-21 2019-12-21 Lightweight text fuzzy search method supporting semantic association

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911331527.1A CN111125308B (en) 2019-12-21 2019-12-21 Lightweight text fuzzy search method supporting semantic association

Publications (2)

Publication Number Publication Date
CN111125308A CN111125308A (en) 2020-05-08
CN111125308B true CN111125308B (en) 2023-02-21

Family

ID=70500891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911331527.1A Active CN111125308B (en) 2019-12-21 2019-12-21 Lightweight text fuzzy search method supporting semantic association

Country Status (1)

Country Link
CN (1) CN111125308B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307771B (en) * 2020-10-29 2021-05-28 平安科技(深圳)有限公司 Course analysis method, device, equipment and medium based on emotion analysis

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070156622A1 (en) * 2006-01-05 2007-07-05 Akkiraju Rama K Method and system to compose software applications by combining planning with semantic reasoning
US9940387B2 (en) * 2011-07-28 2018-04-10 Lexisnexis, A Division Of Reed Elsevier Inc. Search query generation using query segments and semantic suggestions
CN102999563A (en) * 2012-11-01 2013-03-27 无锡成电科大科技发展有限公司 Network resource semantic retrieval method and system based on resource description framework
CN110019650B (en) * 2018-09-04 2024-04-05 北京京东尚科信息技术有限公司 Method and device for providing search association word, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN111125308A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN107133213B (en) Method and system for automatically extracting text abstract based on algorithm
CN111259653B (en) Knowledge graph question-answering method, system and terminal based on entity relationship disambiguation
CN110826336B (en) Emotion classification method, system, storage medium and equipment
CN108038122B (en) Trademark image retrieval method
CN111353030B (en) Knowledge question and answer retrieval method and device based on knowledge graph in travel field
CN109635083B (en) Document retrieval method for searching topic type query in TED (tele) lecture
CN112765312B (en) Knowledge graph question-answering method and system based on graph neural network embedded matching
Ju et al. An efficient method for document categorization based on word2vec and latent semantic analysis
CN112395393A (en) Remote supervision relation extraction method based on multitask and multiple examples
CN114791958B (en) Zero sample cross-modal retrieval method based on variational self-encoder
CN111897944A (en) Knowledge map question-answering system based on semantic space sharing
CN111581364B (en) Chinese intelligent question-answer short text similarity calculation method oriented to medical field
WO2020006488A1 (en) Corpus generating method and apparatus, and human-machine interaction processing method and apparatus
CN113220864A (en) Intelligent question-answering data processing system
CN115759119A (en) Financial text emotion analysis method, system, medium and equipment
CN111125308B (en) Lightweight text fuzzy search method supporting semantic association
CN112926323B (en) Chinese named entity recognition method based on multistage residual convolution and attention mechanism
CN112732944A (en) New method for text retrieval
CN112084312A (en) Intelligent customer service system constructed based on knowledge graph
CN115203378B (en) Retrieval enhancement method, system and storage medium based on pre-training language model
CN113111136B (en) Entity disambiguation method and device based on UCL knowledge space
CN115292533A (en) Cross-modal pedestrian retrieval method driven by visual positioning
CN112579795A (en) Intelligent question-answering method based on knowledge graph embedded representation
CN112860867B (en) Attribute selecting method and storage medium for Chinese question-answering system based on convolution neural network
CN116108146B (en) Information extraction method based on knowledge graph construction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant