CN112527957A - Short text matching method and system applied to news field - Google Patents
Short text matching method and system applied to news field Download PDFInfo
- Publication number
- CN112527957A CN112527957A CN202011424390.7A CN202011424390A CN112527957A CN 112527957 A CN112527957 A CN 112527957A CN 202011424390 A CN202011424390 A CN 202011424390A CN 112527957 A CN112527957 A CN 112527957A
- Authority
- CN
- China
- Prior art keywords
- prefix
- news
- words
- matched
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 230000007246 mechanism Effects 0.000 claims abstract description 154
- 230000011218 segmentation Effects 0.000 claims description 12
- 238000001914 filtration Methods 0.000 claims description 7
- 230000001960 triggered effect Effects 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 101000836337 Homo sapiens Probable helicase senataxin Proteins 0.000 description 2
- 102100027178 Probable helicase senataxin Human genes 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/325—Hash tables
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a short text matching method and a short text matching system applied to the news field, wherein the short text matching method comprises the following steps: step M1: constructing a mechanism index for the mechanism words to be matched by using a k-word prefix tree method; step M2: storing the mechanism index and news to be matched according to a preset format; step M3: and carrying out news mechanism matching according to the news to be matched and the mechanism index. The method and the device can quickly match related mechanisms in massive news data, solve the problem of low matching efficiency of the news data, improve the query efficiency and save the storage space.
Description
Technical Field
The invention relates to the technical field of data processing and news retrieval, in particular to a short text matching method and a short text matching system applied to the news field; and more particularly, to a method and system for string processing and high concurrency news agency matching.
Background
With the development of the internet, under the situation of continuous improvement of science and technology, data enters a big outbreak era, and particularly various news emerge endlessly. How to quickly acquire organizations in news in massive news becomes an important technology in the field of news data processing.
Two main challenges are faced in the current news agency matching technology development process: the first is the problem of complexity of matching time, with the arrival of a big data era, the news data volume is increased rapidly, the matching characteristics are more and more, and the matching process is more and more complicated; the second challenge is the efficiency requirement, and as the internet develops, the timeliness requirement of data becomes higher and higher, and the requirement on the processing capacity of the mechanism matching system is high.
In order to solve the difficulties, the system adopts a K-word prefix tree method to construct indexes for tens of millions of mechanisms, and utilizes a Redis cluster to perform distributed index storage, so that the large space complexity is greatly reduced, and the system has the advantages of compromising the suffix number and the suffix array in terms of calculation space and search speed. And meanwhile, a KMP algorithm is adopted, so that the matching performance is improved.
Patent document CN110321562A (application number: 201910576788.3) discloses a BERT-based short text matching method, which obtains first supervised task data of a first scene according to a requirement of the first scene, performs noise reduction processing on the first supervised task data to generate first data, extracts a first keyword from the first data, performs conversion processing on the first data and the first keyword to generate a first original expression and a first feature expression, inputs the first original expression and the first feature expression to a preset short text matching model respectively, generates a first score of the first original expression and a second score of the first feature expression, and finally determines whether the first score and/or the second score reach a preset threshold, if so, determines that the first supervised task data belongs to a positive sample, otherwise determines that the first supervised task data belongs to a negative sample, the method can play the role of prior knowledge to the maximum extent under the condition of limited supervision task data, and has stronger robustness and interpretability.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a short text matching method and a short text matching system applied to the news field.
The short text matching method applied to the news field provided by the invention comprises the following steps:
step M1: constructing a mechanism index for the mechanism words to be matched by using a K-word prefix tree method;
step M2: storing the mechanism index and news to be matched according to a preset format;
step M3: and carrying out news mechanism matching according to the news to be matched and the mechanism index.
Preferably, the step M1 includes:
step M1.1: the mechanism words comprise N characters, K characters before the mechanism words are selected as mechanism word prefixes, and the N-K characters are used as mechanism word suffixes;
step M1.2: and constructing a prefix tree by taking the prefix words of the K characters as key values and taking the mechanism suffix words with the same prefix words as value values.
Preferably, said step M1.2 comprises: and when the value list size of the mechanism suffix words with the same prefix words exceeds a preset value, carrying out prefix length expansion to ensure that the value list size of each key value is in a preset range.
Preferably, the step M2 includes:
step M2.1: converting K word prefix words in the mechanism index into hash codes through a hash algorithm, storing the hash codes, and storing the hash codes as a prefix word dictionary;
step M2.2: and coding and storing the mechanisms in the value list in the mechanism index.
Preferably, the step M3 includes:
step M3.1: carrying out formatting pretreatment on news to be matched of files with different formats to obtain the pretreated news to be matched;
step M3.2: carrying out sentence segmentation and word segmentation on the preprocessed news to be matched according to a preset rule;
step M3.3: performing mechanism prefix matching and mechanism full-name matching according to the mechanism index;
step M3.4: and performing data filtering processing on the matched mechanism, and outputting the matched mechanism.
Preferably, said step M3.3 comprises:
step M3.3.1: loading a prefix file to obtain a prefix word dictionary;
step M3.3.2: circulating sentence subsets of news to be matched, comparing K-word short words in each sentence with a prefix word dictionary, and performing mechanism full-name matching on the sentences containing the prefix words and a value list corresponding to the prefix words when the short words exist in the prefix word dictionary; when the short word does not exist in the dictionary of the prefix word, the step M3.3.2 is repeatedly executed; and when the sentence containing the prefix word does not have the mechanism matched with the value list, repeatedly executing the step M3.3.2 until the matching of the news to be matched is finished.
The invention provides a short text matching system applied to the news field, which comprises the following components:
module M1: constructing a mechanism index for the mechanism words to be matched by using a K-word prefix tree method;
module M2: storing the mechanism index and news to be matched according to a preset format;
module M3: and carrying out news mechanism matching according to the news to be matched and the mechanism index.
Preferably, said module M1 comprises:
module M1.1: the mechanism words comprise N characters, K characters before the mechanism words are selected as mechanism word prefixes, and the N-K characters are used as mechanism word suffixes;
module M1.2: constructing a prefix tree by taking K-character prefix words as key values and taking mechanism suffix words with the same prefix words as value values;
the module M1.2 comprises: and when the value list size of the mechanism suffix words with the same prefix words exceeds a preset value, carrying out prefix length expansion to ensure that the value list size of each key value is in a preset range.
Preferably, said module M2 comprises:
module M2.1: converting K word prefix words in the mechanism index into hash codes through a hash algorithm, storing the hash codes, and storing the hash codes as a prefix word dictionary;
module M2.2: and coding and storing the mechanisms in the value list in the mechanism index.
Preferably, said module M3 comprises:
module M3.1: carrying out formatting pretreatment on news to be matched of files with different formats to obtain the pretreated news to be matched;
module M3.2: carrying out sentence segmentation and word segmentation on the preprocessed news to be matched according to a preset rule;
module M3.3: performing mechanism prefix matching and mechanism full-name matching according to the mechanism index;
module M3.4: performing data filtering processing on the matched mechanism, and outputting the matched mechanism;
said module M3.3 comprises:
module M3.3.1: loading a prefix file to obtain a prefix word dictionary;
module M3.3.2: circulating sentence subsets of news to be matched, comparing K-word short words in each sentence with a prefix word dictionary, and performing mechanism full-name matching on the sentences containing the prefix words and a value list corresponding to the prefix words when the short words exist in the prefix word dictionary; when the short word does not exist in the prefix word dictionary, the triggering module M3.3.2 is repeatedly triggered to execute; when the sentence containing the prefix word is matched with the mechanism in the value list, the matching structure is added into the result list, and when the sentence containing the prefix word is not matched with the mechanism in the value list, the triggering module M3.3.2 is repeatedly triggered to execute until the matching of the news to be matched is finished.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention provides a method for constructing and storing a text index in a distributed manner, which improves the query efficiency;
2. the invention provides a method and a system for matching character strings, which aim to solve the technical problem of low data matching efficiency under the condition of mass data;
3. the method and the device can quickly match related mechanisms in massive news data, solve the problem of low matching efficiency of the news data, improve the query efficiency and save the storage space.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic diagram of a prefix tree construction;
FIG. 2 is a comparison of different prefix length efficiencies;
fig. 3 is a news agency matching flow chart.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Example 1
The short text matching method applied to the news field provided by the invention comprises the following steps: as shown in fig. 1-3;
step M1: constructing a mechanism index for the mechanism words to be matched by using a K-word prefix tree method;
step M2: storing the mechanism index and news to be matched according to a preset format;
step M3: and carrying out news mechanism matching according to the news to be matched and the mechanism index.
Specifically, the step M1 includes:
step M1.1: the mechanism words comprise N characters, K characters before the mechanism words are selected as mechanism word prefixes, and the N-K characters are used as mechanism word suffixes;
step M1.2: and constructing a prefix tree by taking the prefix words of the K characters as key values and taking the mechanism suffix words with the same prefix words as value values.
In particular, said step M1.2 comprises: and when the value list size of the mechanism suffix words with the same prefix words exceeds a preset value, carrying out prefix length expansion to ensure that the value list size of each key value is in a preset range.
Specifically, the step M2 includes:
step M2.1: converting K word prefix words in the mechanism index into hash codes through a hash algorithm, storing the hash codes, and storing the hash codes as a prefix word dictionary;
step M2.2: and coding and storing the mechanisms in the value list in the mechanism index.
Specifically, the step M3 includes:
step M3.1: carrying out formatting pretreatment on news to be matched of files with different formats to obtain the pretreated news to be matched;
step M3.2: carrying out sentence segmentation and word segmentation on the preprocessed news to be matched according to a preset rule;
step M3.3: performing mechanism prefix matching and mechanism full-name matching according to the mechanism index;
step M3.4: and performing data filtering processing on the matched mechanism, and outputting the matched mechanism.
In particular, said step M3.3 comprises:
step M3.3.1: loading a prefix file to obtain a prefix word dictionary;
step M3.3.2: circulating sentence subsets of news to be matched, comparing K-word short words in each sentence with a prefix word dictionary, and performing mechanism full-name matching on the sentences containing the prefix words and a value list corresponding to the prefix words when the short words exist in the prefix word dictionary; when the short word does not exist in the dictionary of the prefix word, the step M3.3.2 is repeatedly executed; and when the sentence containing the prefix word does not have the mechanism matched with the value list, repeatedly executing the step M3.3.2 until the matching of the news to be matched is finished.
The invention provides a short text matching system applied to the news field, which comprises the following components:
module M1: constructing a mechanism index for the mechanism words to be matched by using a K-word prefix tree method;
module M2: storing the mechanism index and news to be matched according to a preset format;
module M3: and carrying out news mechanism matching according to the news to be matched and the mechanism index.
Specifically, the module M1 includes:
module M1.1: the mechanism words comprise N characters, K characters before the mechanism words are selected as mechanism word prefixes, and the N-K characters are used as mechanism word suffixes;
module M1.2: constructing a prefix tree by taking K-character prefix words as key values and taking mechanism suffix words with the same prefix words as value values;
the module M1.2 comprises: and when the value list size of the mechanism suffix words with the same prefix words exceeds a preset value, carrying out prefix length expansion to ensure that the value list size of each key value is in a preset range.
Specifically, the module M2 includes:
module M2.1: converting K word prefix words in the mechanism index into hash codes through a hash algorithm, storing the hash codes, and storing the hash codes as a prefix word dictionary;
module M2.2: and coding and storing the mechanisms in the value list in the mechanism index.
Specifically, the module M3 includes:
module M3.1: carrying out formatting pretreatment on news to be matched of files with different formats to obtain the pretreated news to be matched;
module M3.2: carrying out sentence segmentation and word segmentation on the preprocessed news to be matched according to a preset rule;
module M3.3: performing mechanism prefix matching and mechanism full-name matching according to the mechanism index;
module M3.4: performing data filtering processing on the matched mechanism, and outputting the matched mechanism;
said module M3.3 comprises:
module M3.3.1: loading a prefix file to obtain a prefix word dictionary;
module M3.3.2: circulating sentence subsets of news to be matched, comparing K-word short words in each sentence with a prefix word dictionary, and performing mechanism full-name matching on the sentences containing the prefix words and a value list corresponding to the prefix words when the short words exist in the prefix word dictionary; when the short word does not exist in the prefix word dictionary, the triggering module M3.3.2 is repeatedly triggered to execute; when the sentence containing the prefix word is matched with the mechanism in the value list, the matching structure is added into the result list, and when the sentence containing the prefix word is not matched with the mechanism in the value list, the triggering module M3.3.2 is repeatedly triggered to execute until the matching of the news to be matched is finished.
Example 2
Example 2 is a modification of example 1
1. Mechanism index building module
Step 1: selecting K characters before a mechanism as a prefix of the mechanism word, and taking N-K characters of the mechanism word as a suffix;
step 2: constructing a prefix tree by taking K word prefix words as Key values and taking mechanism suffix words with the same prefix words as Value values;
the structural effect is schematically shown as follows (taking K as an example to be 3): as shown in figure 1 of the drawings, in which,
comparing the efficiency of different prefix lengths: as shown in fig. 2
And step 3: for the mechanism with larger prefix word universality, namely prefixes with overlarge suffix Value lists, such as Shanghai, Beijing and the like, prefix length expansion is carried out, so that the Value list size of each Key Value is in a self-defined range.
Data storage module
Step 1: for the constructed mechanism prefix tree, converting K-character prefix words into HashCode through a Hash algorithm
Step 2: constructing a code corresponding relation for mechanisms in the Value list, and converting character string types into numerical types by using codes, so that the storage space is reduced, and the query speed is accelerated;
and step 3: storing the prefix words as files to a hard disk, and storing the converted mechanism index into a Redis cluster;
3. news agency matching module
3.1 input module
The module is used for acquiring news to be matched. The input module can be suitable for various input modes, such as: copying and pasting news text, reading a database, transmitting a message queue, reading a file path and the like;
3.2 News preprocessing module
The module is mainly used for carrying out standardized processing on news acquired from the input module
Step 1: if the news is in a file format, such as PDF, Word, HTML and the like, file conversion is needed to be carried out firstly, and the text content in the file is obtained; if the news is in a text format, executing the step 2;
step 2; the text punctuations are processed uniformly and converted into uniform identifiers; characters, which are not Chinese, English and Arabic numerals, in the text are removed;
and step 3: outputting formatted news text
3.3 text splitting module
Step 1: splitting the text into a news sentence subset according to punctuations;
step 2: according to the prefix length of the mechanism, the sentence is split into K-character short words, and the K-character short words enter a mechanism matching module
3.4, a mechanism matching module, as shown in fig. 3;
step 1: loading a prefix file to obtain a prefix word dictionary;
step 2: circulating the sentence subset, and comparing the K word short words in each sentence with the prefix word dictionary; if the short words exist in the prefix word dictionary, entering the step 3, and if the short words do not exist, continuing the step 2;
and step 3: carrying out mechanism full name matching on the sentence Sen1 containing the prefix words and a Value1 list corresponding to the prefix word Key1, and accelerating the matching speed by using a KMP algorithm; if the organization [ Org1, Org 2. ] in the Value1 list is matched in the Sen1, adding the matching result into the result list, and if the organization is not matched, returning to the step 2;
3.5 output module
And loading stop words and a stop mechanism, filtering an output result list of the mechanism matching module, and outputting a final mechanism matching result. Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (10)
1. A short text matching method applied to the news field is characterized by comprising the following steps:
step M1: constructing a mechanism index for the mechanism words to be matched by using a K-word prefix tree method;
step M2: storing the mechanism index and news to be matched according to a preset format;
step M3: and carrying out news mechanism matching according to the news to be matched and the mechanism index.
2. The short text matching method applied to the news domain as set forth in claim 1, wherein the step M1 comprises:
step M1.1: the mechanism words comprise N characters, K characters before the mechanism words are selected as mechanism word prefixes, and the N-K characters are used as mechanism word suffixes;
step M1.2: and constructing a prefix tree by taking the prefix words of the K characters as key values and taking the mechanism suffix words with the same prefix words as value values.
3. The short text matching method applied to the news domain as set forth in claim 2, wherein the step M1.2 comprises: and when the value list size of the mechanism suffix words with the same prefix words exceeds a preset value, carrying out prefix length expansion to ensure that the value list size of each key value is in a preset range.
4. The short text matching method applied to the news domain as set forth in claim 1, wherein the step M2 comprises:
step M2.1: converting K word prefix words in the mechanism index into hash codes through a hash algorithm, storing the hash codes, and storing the hash codes as a prefix word dictionary;
step M2.2: and coding and storing the mechanisms in the value list in the mechanism index.
5. The short text matching method applied to the news domain as set forth in claim 1, wherein the step M3 comprises:
step M3.1: carrying out formatting pretreatment on news to be matched of files with different formats to obtain the pretreated news to be matched;
step M3.2: carrying out sentence segmentation and word segmentation on the preprocessed news to be matched according to a preset rule;
step M3.3: performing mechanism prefix matching and mechanism full-name matching according to the mechanism index;
step M3.4: and performing data filtering processing on the matched mechanism, and outputting the matched mechanism.
6. The short text matching method applied to the news domain as set forth in claim 5, wherein the step M3.3 comprises:
step M3.3.1: loading a prefix file to obtain a prefix word dictionary;
step M3.3.2: circulating sentence subsets of news to be matched, comparing K-word short words in each sentence with a prefix word dictionary, and performing mechanism full-name matching on the sentences containing the prefix words and a value list corresponding to the prefix words when the short words exist in the prefix word dictionary; when the short word does not exist in the dictionary of the prefix word, the step M3.3.2 is repeatedly executed; and when the sentence containing the prefix word does not have the mechanism matched with the value list, repeatedly executing the step M3.3.2 until the matching of the news to be matched is finished.
7. A short text matching system applied to the news field is characterized by comprising:
module M1: constructing a mechanism index for the mechanism words to be matched by using a K-word prefix tree method;
module M2: storing the mechanism index and news to be matched according to a preset format;
module M3: and carrying out news mechanism matching according to the news to be matched and the mechanism index.
8. The short text matching system applied to the news domain as set forth in claim 7, wherein the module M1 comprises:
module M1.1: the mechanism words comprise N characters, K characters before the mechanism words are selected as mechanism word prefixes, and the N-K characters are used as mechanism word suffixes;
module M1.2: constructing a prefix tree by taking K-character prefix words as key values and taking mechanism suffix words with the same prefix words as value values;
the module M1.2 comprises: and when the value list size of the mechanism suffix words with the same prefix words exceeds a preset value, carrying out prefix length expansion to ensure that the value list size of each key value is in a preset range.
9. The short text matching system applied to the news domain as set forth in claim 7, wherein the module M2 comprises:
module M2.1: converting K word prefix words in the mechanism index into hash codes through a hash algorithm, storing the hash codes, and storing the hash codes as a prefix word dictionary;
module M2.2: and coding and storing the mechanisms in the value list in the mechanism index.
10. The short text matching system applied to the news domain as set forth in claim 1, wherein the module M3 comprises:
module M3.1: carrying out formatting pretreatment on news to be matched of files with different formats to obtain the pretreated news to be matched;
module M3.2: carrying out sentence segmentation and word segmentation on the preprocessed news to be matched according to a preset rule;
module M3.3: performing mechanism prefix matching and mechanism full-name matching according to the mechanism index;
module M3.4: performing data filtering processing on the matched mechanism, and outputting the matched mechanism;
said module M3.3 comprises:
module M3.3.1: loading a prefix file to obtain a prefix word dictionary;
module M3.3.2: circulating sentence subsets of news to be matched, comparing K-word short words in each sentence with a prefix word dictionary, and performing mechanism full-name matching on the sentences containing the prefix words and a value list corresponding to the prefix words when the short words exist in the prefix word dictionary; when the short word does not exist in the prefix word dictionary, the triggering module M3.3.2 is repeatedly triggered to execute; when the sentence containing the prefix word is matched with the mechanism in the value list, the matching structure is added into the result list, and when the sentence containing the prefix word is not matched with the mechanism in the value list, the triggering module M3.3.2 is repeatedly triggered to execute until the matching of the news to be matched is finished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011424390.7A CN112527957A (en) | 2020-12-08 | 2020-12-08 | Short text matching method and system applied to news field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011424390.7A CN112527957A (en) | 2020-12-08 | 2020-12-08 | Short text matching method and system applied to news field |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112527957A true CN112527957A (en) | 2021-03-19 |
Family
ID=74998241
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011424390.7A Pending CN112527957A (en) | 2020-12-08 | 2020-12-08 | Short text matching method and system applied to news field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112527957A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115438145A (en) * | 2022-04-13 | 2022-12-06 | 盐城金堤科技有限公司 | Method and device for adding enterprise detail internal chain |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101271466A (en) * | 2008-04-30 | 2008-09-24 | 中山大学 | Electronic dictionary work retrieval method based on self-adapting dictionary tree |
US20090174583A1 (en) * | 2008-01-08 | 2009-07-09 | International Business Machines Corporation | Method for Compressed Data with Reduced Dictionary Sizes by Coding Value Prefixes |
CN105871726A (en) * | 2016-03-21 | 2016-08-17 | 哈尔滨工程大学 | Mode matching method for dynamically adding tree node and unit based on common prefix |
CN110688841A (en) * | 2019-09-30 | 2020-01-14 | 广州准星信息科技有限公司 | Mechanism name identification method, mechanism name identification device, mechanism name identification equipment and storage medium |
-
2020
- 2020-12-08 CN CN202011424390.7A patent/CN112527957A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090174583A1 (en) * | 2008-01-08 | 2009-07-09 | International Business Machines Corporation | Method for Compressed Data with Reduced Dictionary Sizes by Coding Value Prefixes |
CN101271466A (en) * | 2008-04-30 | 2008-09-24 | 中山大学 | Electronic dictionary work retrieval method based on self-adapting dictionary tree |
CN105871726A (en) * | 2016-03-21 | 2016-08-17 | 哈尔滨工程大学 | Mode matching method for dynamically adding tree node and unit based on common prefix |
CN110688841A (en) * | 2019-09-30 | 2020-01-14 | 广州准星信息科技有限公司 | Mechanism name identification method, mechanism name identification device, mechanism name identification equipment and storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115438145A (en) * | 2022-04-13 | 2022-12-06 | 盐城金堤科技有限公司 | Method and device for adding enterprise detail internal chain |
CN115438145B (en) * | 2022-04-13 | 2024-05-14 | 盐城天眼察微科技有限公司 | Method and device for adding enterprise detail inner links |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7031910B2 (en) | Method and system for encoding and accessing linguistic frequency data | |
US11308937B2 (en) | Method and apparatus for identifying key phrase in audio, device and medium | |
CN102915299B (en) | Word segmentation method and device | |
US8175875B1 (en) | Efficient indexing of documents with similar content | |
US8694474B2 (en) | Block entropy encoding for word compression | |
CN109815336B (en) | Text aggregation method and system | |
CN109033410B (en) | SQL (structured query language) analysis method based on regular and character string cutting | |
WO2010043984A2 (en) | Mining new words from a query log for input method editors | |
WO2008098507A1 (en) | An input method of combining words intelligently, input method system and renewing method | |
CN112527957A (en) | Short text matching method and system applied to news field | |
CN111782810A (en) | Text abstract generation method based on theme enhancement | |
Flor | A fast and flexible architecture for very large word n-gram datasets | |
CN113468209A (en) | High-speed memory database access method for power grid monitoring system | |
CN113032371A (en) | Database grammar analysis method and device and computer equipment | |
Youzhuo et al. | Research on lucene based full-text query search service for smart distribution system | |
US10380195B1 (en) | Grouping documents by content similarity | |
CN111930959B (en) | Method and device for generating text by map knowledge | |
Zhang et al. | A dynamic window split-based approach for extracting professional terms from Chinese courses | |
Kostrov et al. | Application of probabilistic approach while forming hash-function by signature in the process of domain-specific local database analysis | |
CN118295980A (en) | High compression ratio compression algorithm for industrial control system log | |
Xiong | An Algorithm Rapidly Segmenting Chinese Sentences into Individual Words | |
Zhou et al. | Improved query model for rapidly query based on distributed hash index | |
Yang et al. | A dictionary mechanism for Chinese word segmentation based on the finite automata | |
CN115545040A (en) | Vehicle type function analysis method and device | |
Cuo et al. | Research on Tibetan Web Standard Text Data Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |