CN112800316A - Search keyword extraction system based on double-array dictionary tree - Google Patents

Search keyword extraction system based on double-array dictionary tree Download PDF

Info

Publication number
CN112800316A
CN112800316A CN202110151716.1A CN202110151716A CN112800316A CN 112800316 A CN112800316 A CN 112800316A CN 202110151716 A CN202110151716 A CN 202110151716A CN 112800316 A CN112800316 A CN 112800316A
Authority
CN
China
Prior art keywords
module
double
check
base
dictionary tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110151716.1A
Other languages
Chinese (zh)
Inventor
张凤超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yiche Interconnection Information Technology Co ltd
Original Assignee
Beijing Yiche Interconnection Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yiche Interconnection Information Technology Co ltd filed Critical Beijing Yiche Interconnection Information Technology Co ltd
Priority to CN202110151716.1A priority Critical patent/CN112800316A/en
Publication of CN112800316A publication Critical patent/CN112800316A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses search keyword extraction system based on double-array dictionary tree, including user interface, inquiry operation module, retrieval module, sequencing module, text operation module, index module, database management module, text database module, first participle module and second participle module, retrieval module's inside is provided with first participle module, and index module's inside is provided with the second participle module, interconnect between user interface and the inquiry operation module, interconnect between inquiry operation module and the retrieval module, interconnect between retrieval module and the sequencing module, interconnect between user interface and the database management module. The method has the advantages that the AC state machine is fully utilized to complete pattern matching at high speed, so that the automobile related words in the phrase text can be quickly recognized, the detailed intention of the user can be obtained, the intention can be transmitted to the subsequent searching process, and the retrieval result can be more in line with the expectation of the user.

Description

Search keyword extraction system based on double-array dictionary tree
Technical Field
The application relates to a keyword extraction system, in particular to a search keyword extraction system based on a double-array dictionary tree.
Background
When a user uses a search engine to search contents, if long-tail words appear, returned results are often bad, the former results may not be wanted by the user, because all words in the middle of the long-tail words are undifferentiated, and based on semantic analysis, the user knows that a word is in the middle and a few words are keywords. For an engine in the automobile industry, words related to automobiles and input information by a user need to be extracted. So as to facilitate later better analysis processing.
The basic solutions to this problem industry are as follows:
scheme one, TF-IDF algorithm, TF-IDF is a numerical statistical method, is used for reflecting the importance of a word to some document in anticipation, its main thought is: if the frequency of a word appearing in a document is high, TF is high; and is rarely found in other documents, i.e., the IDF is high, the word is considered to have a good category discrimination ability.
The second scheme and the TextRank algorithm have the important characteristic that the key words of a document can be extracted by analyzing a single document only by separating from the background of a corpus.
The third scheme is as follows: matching candidate words: the method is characterized in that candidates are obtained based on multi-pattern matching of a keyword word stock, the most important work is word stock construction, and a plurality of methods are fused: vertical site proper nouns, encyclopedia entries, input method cell lexicon, advertiser purchase words
The first problem and the first disadvantage of the scheme are that sometimes the word frequency is used for measuring the importance of a word in an article, sometimes the important word is not enough, and the calculation cannot reflect the position information and the importance of the word in the context.
The second problem, the solution of the second solution, is based on the PageRank, and the PageRank data needs to be prepared, but the real-time performance of the identification is not good, and the old page is higher than the new page. Since even a very good new page will not have many upstream links unless it is a child of a site.
The third problem, the third use scheme, is too dependent on the real-time property of the dictionary and word stock construction. Need to be refreshed frequently to meet the needs. Therefore, a search keyword extraction system based on a double-array dictionary tree is proposed to solve the above problems.
Disclosure of Invention
A search keyword extraction system based on a double-array dictionary tree comprises a user interface, a query operation module, a retrieval module, a sequencing module, a text operation module, an indexing module, an index module, a database management module, a text database module, a first word segmentation module and a second word segmentation module, wherein the first word segmentation module is arranged in the retrieval module, and the second word segmentation module is arranged in the index module;
the user interface is connected with the query operation module, the query operation module is connected with the retrieval module, and the retrieval module is connected with the sequencing module.
Further, the user interface is interconnected with the database management module.
Further, the text operation module and the database management module are connected with each other.
Further, the text operation module and the indexing module are connected with each other.
Further, the indexing module and the indexing module are connected with each other.
Further, the indexing module and the retrieval module are connected with each other.
Further, the indexing module and the database management module are connected with each other.
Further, the database management module and the text database module are connected with each other.
Further, the user interface is a third party packaging, HTTP protocol.
Further, the step of extracting the keywords in the index module is as follows:
(1) setting an array subscript as i, and if both base [ i ] and check [ i ] are 0, indicating that the position is empty;
(2) if base [ i ] is negative (leaf node) then it indicates that the state may be an end state;
the subscript of the state Ab, Ac, Ad... An, A in base [ ] is i, and base [ i ] ═ j is arranged;
(3) to ensure that the direct children of A can be placed into the array, j should satisfy:
base[j+b]==0,base[j+c]==0,base[j+d]==0...base[j+n]==0;
check[j+b]==0,check[j+c]==0.......check[j+n]==0,
after the value of j is determined, the subscripts of Ab, Ac, Ad... An, are also determined.
J + n, j + b, j + c, j + d
Simultaneously ordering:
check[j+b]=i,check[j+c]=i,check[j+d]=i.......check[j+n]=i
(4) query
Figure BDA0002932790040000031
The beneficial effect of this application is: the search keyword extraction system based on the double-array dictionary tree can achieve the purpose of rapidly recognizing automobile related words in phrase texts.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic overall structure diagram of an embodiment of the present application;
FIG. 2 is a schematic diagram of an internal structure of a search module according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an internal structure of an index module according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a keyword extraction process according to an embodiment of the present application;
FIG. 5 is a diagram of a dictionary according to an embodiment of the present application.
In the figure: 1. the system comprises a user interface, a query operation module, a search module, a sorting module, a text operation module, a text indexing module, a text database module, a text indexing module, a text database module, a text segmentation module, a first segmentation module, a text segmentation module and a second segmentation module, wherein the user interface comprises 2 the query operation module, 3 the search module, 4 the sorting module, 5 the text operation module, 6 the indexing module, 7 the.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings. These terms are used primarily to better describe the present application and its embodiments, and are not used to limit the indicated devices, elements or components to a particular orientation or to be constructed and operated in a particular orientation.
Moreover, some of the above terms may be used to indicate other meanings besides the orientation or positional relationship, for example, the term "on" may also be used to indicate some kind of attachment or connection relationship in some cases. The specific meaning of these terms in this application will be understood by those of ordinary skill in the art as appropriate.
Furthermore, the terms "mounted," "disposed," "provided," "connected," and "sleeved" are to be construed broadly. For example, it may be a fixed connection, a removable connection, or a unitary construction; can be a mechanical connection, or an electrical connection; may be directly connected, or indirectly connected through intervening media, or may be in internal communication between two devices, elements or components. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1-5, a search keyword extraction system based on a double-array dictionary tree includes a user interface 1, a query operation module 2, a retrieval module 3, a sorting module 4, a text operation module 5, an indexing module 6, an indexing module 7, a database management module 8, a text database module 9, a first segmentation module 10 and a second segmentation module 11, wherein the retrieval module 3 is internally provided with the first segmentation module 10, and the indexing module 7 is internally provided with the second segmentation module 11;
the user interface 1 is connected with the query operation module 2, the query operation module 2 is connected with the retrieval module 3, and the retrieval module 3 is connected with the sequencing module 4;
the user interface 1 and the database management module 8 are connected with each other; the text operation module 5 and the database management module 8 are connected with each other; the text operation module 5 and the indexing module 6 are connected with each other; the indexing module 6 and the indexing module 7 are connected with each other; the indexing module 7 and the retrieval module 3 are connected with each other; the indexing module 6 is connected with the database management module 8; the database management module 8 and the text database module 9 are connected with each other; the user interface 1 is a third party packaging and HTTP protocol;
the user interface (1) is used for providing an interface for a caller, and can be in a third-party packaging and HTTP (hyper text transport protocol) mode; the query operation module (2) performs matching search according to the keywords of the user; the retrieval module (3) extracts keywords for query; the indexing module (7) carries out the process of establishing reverse and sequential indexes after word segmentation and merging on the source data; the text operation module (5) performs a text filtering process and performs word segmentation cleaning; and the indexing module (6) performs an article classification quality score calculation process.
The step of extracting the key words in the index module 7 is as follows:
(1) setting an array subscript as i, and if both base [ i ] and check [ i ] are 0, indicating that the position is empty;
(2) if base [ i ] is negative (leaf node) then it indicates that the state may be an end state;
the subscript of the state Ab, Ac, Ad... An, A in base [ ] is i, and base [ i ] ═ j is arranged;
(3) to ensure that the direct children of A can be placed into the array, j should satisfy:
base[j+b]==0,base[j+c]==0,base[j+d]==0...base[j+n]==0;
check[j+b]==0,check[j+c]==0.......check[j+n]==0,
after the value of j is determined, the subscripts of Ab, Ac, Ad... An, are also determined.
J + n, j + b, j + c, j + d
Simultaneously ordering:
check[j+b]=i,check[j+c]=i,check[j+d]=i.......check[j+n]=i
(4) query
Figure BDA0002932790040000061
When the system is used, the system is provided for a caller through a user interface, the user interface can enable a third party to package and have an HTTP protocol, data are transmitted to the query operation module 2 through the user interface according to the requirements of the user and are matched and searched through the query operation module 2 according to keywords provided by the user, the source data are segmented and merged through the first segmentation module 10 through the retrieval module 3 and then are inverted and sequentially indexed, the source data are transmitted back to the user interface 1 through the sequencing module 4, and query operation can be continued through user feedback;
extract the inside data of text database module 9 through database management module 8, carry data to indexing module 6 department through database management module 8, carry out categorised quality branch calculation to article data through indexing module 6, carry text data to text operation module 5 department through database management module 8, filter the text through text operation module 5, the participle is clear, the document after will falling through indexing module 6 is carried to index module 7 department, carry the text to user interface 1 department through database management module 8, establish the back of merging source data through the participle in the index module 7 and fall, the process of arranging the index in order, extract the keyword step in the index module 7 and do:
(1) setting an array subscript as i, and if both base [ i ] and check [ i ] are 0, indicating that the position is empty;
(2) if base [ i ] is negative (leaf node) then it indicates that the state may be an end state;
the subscript of the state Ab, Ac, Ad... An, A in base [ ] is i, and base [ i ] ═ j is arranged;
(3) to ensure that the direct children of A can be placed into the array, j should satisfy:
base[j+b]==0,base[j+c]==0,base[j+d]==0...base[j+n]==0;
check[j+b]==0,check[j+c]==0.......check[j+n]==0,
after the value of j is determined, the subscripts of Ab, Ac, Ad... An, are also determined.
J + n, j + b, j + c, j + d
Simultaneously ordering:
check[j+b]=i,check[j+c]=i,check[j+d]=i.......check[j+n]=i
and after all the states are set in a traversing way, the construction of the even number group is finished.
DAT queries are extremely convenient. When there are several characters in the word, the Chinese characters are converted into corresponding sequence codes, and then the corresponding word can be found by adding several times without halving the search. Since the average length of the chinese words does not exceed 4 chinese characters, the efficiency of the DAT query algorithm is extremely high.
(4) Query
Figure BDA0002932790040000071
Figure BDA0002932790040000081
The stage is summarized as follows:
1, two arrays: base [ ], check [ ].
2, each element in base [ ] corresponds to a node of the trie tree whose value is the base value for the transition to the next state.
3 check [ ], the previous state of the current state, for checking whether this state exists
4, for a transition from state s to state t, it must be satisfied that:
base [ s ] + c ═ t check [ base [ s ] + c ═ s where c is the input variable.
And (3) query flow:
there is a phrase "Aslin, Aston Martin, Astri, Odysi", the word code is as follows
1-A, 2-O, 3-Si, 4-tri, 5-lin, 6-ton, 7-ma, 8-d, 9-de, 10-sai
Processing to generate dictionary (as shown in FIG. 5)
And (3) inquiring:
input "Aston"
Coding 1 for 'a', base [1] ═ 1, and then, the input state is's' coding 3,
base[1]+3=4,check[4]=1
match, so 'As' is a state and base [4] >0, can continue
The input state is the 'ton' code 6,
base[4]+6=8
check[8]=4
thus, 'Ashin' is an end state, base [8] <0, 'Aston' is a word.
The common prefix of the character string is utilized to save storage space, the searching speed is high, the finite automata is utilized, each node represents a state, and the state conversion is carried out according to the difference of input variables. When the end state is reached or the transfer cannot be carried out, the query is completed, the root node does not contain characters, each node except the root node only contains one character, the characters passing through the path from the root node to a certain node are connected, and all child nodes of each node contain different characters for the character strings corresponding to the node.
The application has the advantages that: the method has the advantages that the AC state machine is fully utilized to complete pattern matching at a high speed, so that automobile related words in phrase texts are quickly recognized, the detailed intentions of users are obtained, the intentions are transmitted to the subsequent searching process, and the retrieval results are more in line with the expectations of the users.
It is well within the skill of those in the art to implement, without undue experimentation, the present application is not directed to software and process improvements, as they relate to circuits and electronic components and modules.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A search keyword extraction system based on a double-array dictionary tree is characterized in that: the system comprises a user interface (1), a query operation module (2), a retrieval module (3), a sorting module (4), a text operation module (5), an indexing module (6), an index module (7), a database management module (8), a text database module (9), a first word segmentation module (10) and a second word segmentation module (11), wherein the first word segmentation module (10) is arranged in the retrieval module (3), and the second word segmentation module (11) is arranged in the index module (7)
The user interface (1) is connected with the query operation module (2), the query operation module (2) is connected with the retrieval module (3), and the retrieval module (3) is connected with the sequencing module (4).
2. The system for extracting search keywords based on the double-array dictionary tree according to claim 1, wherein: the user interface (1) and the database management module (8) are connected with each other.
3. The system for extracting search keywords based on the double-array dictionary tree according to claim 1, wherein: the text operation module (5) and the database management module (8) are connected with each other.
4. The system for extracting search keywords based on the double-array dictionary tree according to claim 1, wherein: the text operation module (5) and the indexing module (6) are connected with each other.
5. The system for extracting search keywords based on the double-array dictionary tree according to claim 1, wherein: the indexing module (6) and the indexing module (7) are connected with each other.
6. The system for extracting search keywords based on the double-array dictionary tree according to claim 1, wherein: the index module (7) and the retrieval module (3) are connected with each other.
7. The system for extracting search keywords based on the double-array dictionary tree according to claim 1, wherein: the indexing module (6) is connected with the database management module (8).
8. The system for extracting search keywords based on the double-array dictionary tree according to claim 1, wherein: the database management module (8) and the text database module (9) are connected with each other.
9. The system for extracting search keywords based on the double-array dictionary tree according to claim 1, wherein: the user interface (1) is a third party packaging and HTTP protocol.
10. The system for extracting search keywords based on the double-array dictionary tree according to claim 1, wherein: the step of extracting the key words in the index module (7) is as follows:
(1) setting an array subscript as i, and if both base [ i ] and check [ i ] are 0, indicating that the position is empty;
(2) if base [ i ] is negative (leaf node) then it indicates that the state may be an end state;
the subscript of the state Ab, Ac, Ad... An, A in base [ ] is i, and base [ i ] ═ j is arranged;
(3) to ensure that the direct children of A can be placed into the array, j should satisfy:
base[j+b]==0,base[j+c]==0,base[j+d]==0...base[j+n]==0;
check[j+b]==0,check[j+c]==0.......check[j+n]==0,
after the value of j is determined, the subscripts of Ab, Ac, Ad... An, are also determined.
J + n, j + b, j + c, j + d
Simultaneously ordering:
check[j+b]=i,check[j+c]=i,check[j+d]=i.......check[j+n]=i
(4) query
Figure FDA0002932790030000021
CN202110151716.1A 2021-02-04 2021-02-04 Search keyword extraction system based on double-array dictionary tree Pending CN112800316A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110151716.1A CN112800316A (en) 2021-02-04 2021-02-04 Search keyword extraction system based on double-array dictionary tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110151716.1A CN112800316A (en) 2021-02-04 2021-02-04 Search keyword extraction system based on double-array dictionary tree

Publications (1)

Publication Number Publication Date
CN112800316A true CN112800316A (en) 2021-05-14

Family

ID=75814108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110151716.1A Pending CN112800316A (en) 2021-02-04 2021-02-04 Search keyword extraction system based on double-array dictionary tree

Country Status (1)

Country Link
CN (1) CN112800316A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446162A (en) * 2016-09-26 2017-02-22 浙江大学 Orient field self body intelligence library article search method
CN106649286A (en) * 2016-10-15 2017-05-10 语联网(武汉)信息技术有限公司 Method for conducting term matching on basis of double-array lexicographic tree
CN107239549A (en) * 2017-06-07 2017-10-10 传神语联网网络科技股份有限公司 Method, device and the terminal of database terminology retrieval
CN110399457A (en) * 2019-07-01 2019-11-01 吉林大学 A kind of intelligent answer method and system
CN110516118A (en) * 2019-08-13 2019-11-29 出门问问(武汉)信息科技有限公司 A kind of character string matching method, equipment and computer storage medium
CN110851722A (en) * 2019-11-12 2020-02-28 腾讯云计算(北京)有限责任公司 Search processing method, device and equipment based on dictionary tree and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446162A (en) * 2016-09-26 2017-02-22 浙江大学 Orient field self body intelligence library article search method
CN106649286A (en) * 2016-10-15 2017-05-10 语联网(武汉)信息技术有限公司 Method for conducting term matching on basis of double-array lexicographic tree
CN107239549A (en) * 2017-06-07 2017-10-10 传神语联网网络科技股份有限公司 Method, device and the terminal of database terminology retrieval
CN110399457A (en) * 2019-07-01 2019-11-01 吉林大学 A kind of intelligent answer method and system
CN110516118A (en) * 2019-08-13 2019-11-29 出门问问(武汉)信息科技有限公司 A kind of character string matching method, equipment and computer storage medium
CN110851722A (en) * 2019-11-12 2020-02-28 腾讯云计算(北京)有限责任公司 Search processing method, device and equipment based on dictionary tree and storage medium

Similar Documents

Publication Publication Date Title
CN109492077B (en) Knowledge graph-based petrochemical field question-answering method and system
CN102254014B (en) Adaptive information extraction method for webpage characteristics
US8751218B2 (en) Indexing content at semantic level
CN108829658B (en) Method and device for discovering new words
CN101079024B (en) Special word list dynamic generation system and method
CN110674252A (en) High-precision semantic search system for judicial domain
CN102637180B (en) Character post processing method and device based on regular expression
CN101751386B (en) Identification method of unknown words
CN113886604A (en) Job knowledge map generation method and system
CN112507109A (en) Retrieval method and device based on semantic analysis and keyword recognition
CN105912662A (en) Coreseek-based vertical search engine research and optimization method
CN113312474A (en) Similar case intelligent retrieval system of legal documents based on deep learning
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
CN102722526B (en) Part-of-speech classification statistics-based duplicate webpage and approximate webpage identification method
CN111259223B (en) News recommendation and text classification method based on emotion analysis model
Gupta et al. Improving unsupervised stemming by using partial lemmatization coupled with data-based heuristics for Hindi
CN112800316A (en) Search keyword extraction system based on double-array dictionary tree
de Oliveira et al. A syntactic-relationship approach to construct well-informative knowledge graphs representation
CN107291952B (en) Method and device for extracting meaningful strings
CN110532538A (en) Property dispute judgement document&#39;s critical entities extraction algorithm
Wang et al. An approach to concept-obtained text summarization
TWI534640B (en) Chinese network information monitoring and analysis system and its method
Lesher et al. A web-based system for autonomous text corpus generation
CN113609296B (en) Data processing method and device for public opinion data identification
US20240070396A1 (en) Method for Determining Candidate Company Related to News and Apparatus for Performing the Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination