CN108846094A - A method of based on index in classification interaction - Google Patents

A method of based on index in classification interaction Download PDF

Info

Publication number
CN108846094A
CN108846094A CN201810617412.8A CN201810617412A CN108846094A CN 108846094 A CN108846094 A CN 108846094A CN 201810617412 A CN201810617412 A CN 201810617412A CN 108846094 A CN108846094 A CN 108846094A
Authority
CN
China
Prior art keywords
text
character
participle
keyword
interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810617412.8A
Other languages
Chinese (zh)
Inventor
何中
汤海泉
严伟
戴建峰
顾永新
王斌
何登
巢振军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU ZHONGWEI TECHNOLOGY SOFTWARE SYSTEM Co Ltd
Original Assignee
JIANGSU ZHONGWEI TECHNOLOGY SOFTWARE SYSTEM Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU ZHONGWEI TECHNOLOGY SOFTWARE SYSTEM Co Ltd filed Critical JIANGSU ZHONGWEI TECHNOLOGY SOFTWARE SYSTEM Co Ltd
Priority to CN201810617412.8A priority Critical patent/CN108846094A/en
Publication of CN108846094A publication Critical patent/CN108846094A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of modes based on index in classification interaction, include the following steps:A, it selects text and replicates, paste into system, system will carry out Word Intelligent Segmentation automatically, be shown phrase with block mode after participle;B, blocky participle is supported to choose, and participle is brought into the text box of top after click, the participle chosen is again tapped on, then cancels selection;C, retrieval interaction is carried out, the present invention can carry out Word Intelligent Segmentation to one section of text, text can be segmented automatically after duplication paste text data, phrase after participle is shown with bulk, user can freely pull combination phrase, and single phrase or combined phrase can be used as keyword and retrieved, it is only necessary to which keyword is dragged in operation system, it can be retrieved and be attempted automatically, it is convenient and efficient.

Description

A method of based on index in classification interaction
Technical field
The present invention relates to retrieval technique field, specially a kind of mode based on index in classification interaction.
Background technique
Retrieval is a kind of Chinese vocabulary, is referred to from the specific information requirement of user, uses one to specific information aggregate Fixed method, technological means therefrom find out relevant information according to certain clue and rule;In cybertimes, we without when without Carry out retrieval with carving.Mode there are mainly two types of being retrieved on the internet:Catalogue browsing and use search engine;Catalogue is clear For the mode look at i.e. Yet Another Hierarchically Officious Ora by the way of, user can click catalogue according to their own needs, go deep into next straton Catalogue, to find the information of oneself needs.This mode is convenient for searching certain a kind of information aggregate, but pinpoint energy Power is not strong;Search engine is presently the most a kind of common Web Search Tools.User only needs to submit the demand of oneself, search Engine can return to large result.These results are ranked up according to the correlation putd question to retrieval.
The mode of retrieval interaction at present is retrieved by way of being manually entered text mostly, such as Google, Baidu Equal search engines, we the modes such as input by keyboard, to be retrieved, more links being manually entered.And if necessary Cross-system retrieve, and needs to be repeatedly input in multiple systems, comparatively laborious.
Summary of the invention
The purpose of the present invention is to provide a kind of modes based on index in classification interaction, to solve to mention in above-mentioned background technique Out the problem of.
To achieve the above object, the present invention provides the following technical solutions:A method of based on index in classification interaction, including Following steps:
A, it selects text and to replicate, paste into system, system will carry out Word Intelligent Segmentation automatically, by phrase with bulk after participle Mode is shown;
B, blocky participle is supported to choose, and participle is brought into the text box of top after click, again taps on the participle chosen, then Cancel selection;
C, retrieval interaction is carried out, after pulling retrieval, the result after retrieval is directly shown.
Preferably, Word Intelligent Segmentation method is as follows in the step A:
A, the characteristic information of text to be segmented is obtained, wherein the characteristic information includes paragraph division, punctuation mark or sky At least one of lattice symbol;
B, it according to the characteristic information, determines described wait segment all natural sections in text;
C, natural section is divided into ambiguity section and non-ambiguity section;
D, it determines the candidate word in ambiguity section, and candidate word is matched with the text in non-ambiguity section;
E, the word segmentation regulation of candidate word is determined according to matching result, and is carried out according to text of the word segmentation regulation to ambiguity section Word segmentation processing.
Preferably, it includes that single participle pulls retrieval that interaction is retrieved in the step C;The multiple participles of text box, group unification Play retrieval;Multiselect combination is retrieved.
Preferably, text matching technique is as follows in the step d:
1) character in tested text, is subjected to individual segmentation, the character string after being divided;
2), the character in the character string after segmentation is matched with the key character in library of falling to set up type respectively;It is described fall Typesetting library is the position letter for being decomposed and being recorded each key character character by character to the keyword of input in the keyword It is formed after breath;
3) rule, is determined according to the fuzziness of setting, when determining that key character matches in each keyword of successful match The values of ambiguity used obtains the matching fuzziness of each keyword;
4), according to the matching fuzziness of each keyword, the average blur degree of the keyword of input is determined, according to described flat Equal fuzziness determines whether the tested text meets filter condition.
Preferably, the participle processing method in the step e is as follows:
A), obtain wait segment the corresponding first eigenvector of each individual character in sentence and the corresponding second feature vector of two words;
B), according to the first eigenvector and second feature vector, the current third feature vector of each individual character is determined;
C), the third feature vector current according to preset Chinese character label transfer matrix and each individual character, will it is described to It segments sentence and carries out word segmentation processing.
Compared with prior art, the beneficial effects of the invention are as follows:The present invention can carry out Word Intelligent Segmentation, duplication to one section of text Text can be segmented automatically after paste text data, the phrase after participle is shown with bulk, and user can freely pull Combination phrase, single phrase or combined phrase can be used as keyword and retrieved, it is only necessary to which keyword is dragged to business system On system, it can be retrieved and be attempted automatically, it is convenient and efficient;In addition, the Word Intelligent Segmentation method that the present invention uses effectively improves Relevance between word segmentation result and text context to be segmented, so that the accuracy of participle gets a promotion.
Detailed description of the invention
Fig. 1 is schematic diagram after Word Intelligent Segmentation of the present invention;
Fig. 2 is that phrase of the present invention pulls permutation and combination schematic diagram;
Fig. 3 is that the single participle of the present invention pulls retrieval schematic diagram;
Fig. 4 is that schematic diagram is retrieved in the multiple participle combinations of the present invention together;
Fig. 5 is multiselect combined retrieval schematic diagram of the present invention;
Fig. 6 is display schematic diagram after present invention retrieval.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Fig. 1-6 is please referred to, the present invention provides a kind of technical solution:A method of based on index in classification interaction, including with Lower step:
A, it selects text and to replicate, paste into system, system will carry out Word Intelligent Segmentation automatically, by phrase with bulk after participle Mode is shown;
B, blocky participle is supported to choose, and participle is brought into the text box of top after click, again taps on the participle chosen, then Cancel selection;
C, retrieval interaction is carried out, after pulling retrieval, the result after retrieval is directly shown.
In the present invention, Word Intelligent Segmentation method is as follows in step A:
A, the characteristic information of text to be segmented is obtained, wherein the characteristic information includes paragraph division, punctuation mark or sky At least one of lattice symbol;
B, it according to the characteristic information, determines described wait segment all natural sections in text;
C, natural section is divided into ambiguity section and non-ambiguity section;
D, it determines the candidate word in ambiguity section, and candidate word is matched with the text in non-ambiguity section;
E, the word segmentation regulation of candidate word is determined according to matching result, and is carried out according to text of the word segmentation regulation to ambiguity section Word segmentation processing.
The Word Intelligent Segmentation method that the present invention uses effectively increases being associated between word segmentation result and text context to be segmented Property, so that the accuracy of participle gets a promotion.
In addition, retrieving interaction in the present invention, in step C includes that single participle pulls retrieval;The multiple participles of text box, combination It retrieves together;Multiselect combination is retrieved.
In the present invention, text matching technique is as follows in step d:
1) character in tested text, is subjected to individual segmentation, the character string after being divided;
2), the character in the character string after segmentation is matched with the key character in library of falling to set up type respectively;It is described fall Typesetting library is the position letter for being decomposed and being recorded each key character character by character to the keyword of input in the keyword It is formed after breath;
3) rule, is determined according to the fuzziness of setting, when determining that key character matches in each keyword of successful match The values of ambiguity used obtains the matching fuzziness of each keyword;
4), according to the matching fuzziness of each keyword, the average blur degree of the keyword of input is determined, according to described flat Equal fuzziness determines whether the tested text meets filter condition.
Formed by establishing key word library and fall typesetting library, it is established that keyword inverted index, then for tested text by One is filtered matching, and the fuzziness strategy based on setting, carries out fuzzy matching, is filtered after obtaining matching result.
In addition, the participle processing method in step e is as follows in the present invention:
A), obtain wait segment the corresponding first eigenvector of each individual character in sentence and the corresponding second feature vector of two words;
B), according to the first eigenvector and second feature vector, the current third feature vector of each individual character is determined;
C), the third feature vector current according to preset Chinese character label transfer matrix and each individual character, will it is described to It segments sentence and carries out word segmentation processing.
The participle processing method realizes the word segmentation processing for treating participle sentence, and process is simple, is easily achieved, and simplifies net Network structure, the requirement for reducing volume and memory to mobile terminal, improve user experience.
In conclusion the present invention can carry out Word Intelligent Segmentation to one section of text, it can be automatically to text after duplication paste text data This is segmented, and the phrase after participle is shown with bulk, and user can freely pull combination phrase, single phrase or combination Phrase can be used as keyword and be retrieved, it is only necessary to keyword is dragged in operation system, can carry out automatically retrieval and It attempts, it is convenient and efficient;In addition, the Word Intelligent Segmentation method that the present invention uses effectively increases word segmentation result and text context to be segmented Between relevance so that participle accuracy get a promotion.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is defined by the appended.

Claims (5)

1. a kind of mode based on index in classification interaction, it is characterised in that;Include the following steps;
A, it selects text and to replicate, paste into system, system will carry out Word Intelligent Segmentation automatically, by phrase with block mode after participle It is shown;
B, blocky participle is supported to choose, and participle is brought into the text box of top after click, the participle chosen is again tapped on, then cancels Selection;
C, retrieval interaction is carried out, after pulling retrieval, the result after retrieval is directly shown.
2. a kind of mode based on index in classification interaction according to claim 1, it is characterised in that;Intelligence in the step A Energy segmenting method is as follows;
A, the characteristic information of text to be segmented is obtained, wherein the characteristic information includes paragraph division, punctuation mark or space character At least one of;
B, it according to the characteristic information, determines described wait segment all natural sections in text;
C, natural section is divided into ambiguity section and non-ambiguity section;
D, it determines the candidate word in ambiguity section, and candidate word is matched with the text in non-ambiguity section;
E, the word segmentation regulation of candidate word is determined according to matching result, and is segmented according to text of the word segmentation regulation to ambiguity section Processing.
3. a kind of mode based on index in classification interaction according to claim 1, it is characterised in that;It is examined in the step C Rope interaction includes that single participle pulls retrieval;The multiple participles of text box, combination are retrieved together;Multiselect combination is retrieved.
4. a kind of mode based on index in classification interaction according to claim 2, it is characterised in that;The step d Chinese This matching process is as follows;
1) character in tested text, is subjected to individual segmentation, the character string after being divided;
2), the character in the character string after segmentation is matched with the key character in library of falling to set up type respectively;It is described to fall to set up type Library is to be decomposed and recorded each key character character by character to the keyword of input after the location information in the keyword It is formed;
3) rule, is determined according to the fuzziness of setting, is determined and is used when key character matches in each keyword of successful match Values of ambiguity, obtain the matching fuzziness of each keyword;
4), according to the matching fuzziness of each keyword, the average blur degree of the keyword of input is determined, according to the average mould Paste degree determines whether the tested text meets filter condition.
5. a kind of mode based on index in classification interaction according to claim 2, it is characterised in that;In the step e Participle processing method is as follows;
A), obtain wait segment the corresponding first eigenvector of each individual character in sentence and the corresponding second feature vector of two words;
B), according to the first eigenvector and second feature vector, the current third feature vector of each individual character is determined;
C), the third feature vector current according to preset Chinese character label transfer matrix and each individual character, by described wait segment Sentence carries out word segmentation processing.
CN201810617412.8A 2018-06-15 2018-06-15 A method of based on index in classification interaction Pending CN108846094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810617412.8A CN108846094A (en) 2018-06-15 2018-06-15 A method of based on index in classification interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810617412.8A CN108846094A (en) 2018-06-15 2018-06-15 A method of based on index in classification interaction

Publications (1)

Publication Number Publication Date
CN108846094A true CN108846094A (en) 2018-11-20

Family

ID=64202987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810617412.8A Pending CN108846094A (en) 2018-06-15 2018-06-15 A method of based on index in classification interaction

Country Status (1)

Country Link
CN (1) CN108846094A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800352A (en) * 2018-12-30 2019-05-24 上海触乐信息科技有限公司 Method, system and the terminal device of information push are carried out based on clipbook
CN111310481A (en) * 2020-01-19 2020-06-19 百度在线网络技术(北京)有限公司 Speech translation method, device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541960A (en) * 2010-12-31 2012-07-04 北大方正集团有限公司 Method and device of fuzzy retrieval
CN104750673A (en) * 2013-12-31 2015-07-01 中国移动通信集团公司 Text matching and filtering method and text matching and filtering device
CN105447187A (en) * 2015-12-15 2016-03-30 广州神马移动信息科技有限公司 Webpage search method and system
CN105989030A (en) * 2015-02-02 2016-10-05 阿里巴巴集团控股有限公司 Text retrieval method and device
CN107832302A (en) * 2017-11-22 2018-03-23 北京百度网讯科技有限公司 Participle processing method, device, mobile terminal and computer-readable recording medium
CN107918604A (en) * 2017-11-13 2018-04-17 彩讯科技股份有限公司 A kind of Chinese segmenting method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541960A (en) * 2010-12-31 2012-07-04 北大方正集团有限公司 Method and device of fuzzy retrieval
CN104750673A (en) * 2013-12-31 2015-07-01 中国移动通信集团公司 Text matching and filtering method and text matching and filtering device
CN105989030A (en) * 2015-02-02 2016-10-05 阿里巴巴集团控股有限公司 Text retrieval method and device
CN105447187A (en) * 2015-12-15 2016-03-30 广州神马移动信息科技有限公司 Webpage search method and system
CN107918604A (en) * 2017-11-13 2018-04-17 彩讯科技股份有限公司 A kind of Chinese segmenting method and device
CN107832302A (en) * 2017-11-22 2018-03-23 北京百度网讯科技有限公司 Participle processing method, device, mobile terminal and computer-readable recording medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800352A (en) * 2018-12-30 2019-05-24 上海触乐信息科技有限公司 Method, system and the terminal device of information push are carried out based on clipbook
CN109800352B (en) * 2018-12-30 2022-08-12 上海触乐信息科技有限公司 Method, system and terminal device for pushing information based on clipboard
CN111310481A (en) * 2020-01-19 2020-06-19 百度在线网络技术(北京)有限公司 Speech translation method, device, computer equipment and storage medium
CN111310481B (en) * 2020-01-19 2021-05-18 百度在线网络技术(北京)有限公司 Speech translation method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107609121B (en) News text classification method based on LDA and word2vec algorithm
WO2021121198A1 (en) Semantic similarity-based entity relation extraction method and apparatus, device and medium
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN113268995B (en) Chinese academy keyword extraction method, device and storage medium
CN1728142B (en) Phrase identification method and device in an information retrieval system
CN111324771B (en) Video tag determination method and device, electronic equipment and storage medium
CN114065758B (en) Document keyword extraction method based on hypergraph random walk
CN112395395B (en) Text keyword extraction method, device, equipment and storage medium
JP2005526317A (en) Method and system for automatically searching a concept hierarchy from a document corpus
US8583669B2 (en) Query suggestion for efficient legal E-discovery
CN109086355B (en) Hot-spot association relation analysis method and system based on news subject term
CN115796181A (en) Text relation extraction method for chemical field
CN107844493B (en) File association method and system
CN113434636A (en) Semantic-based approximate text search method and device, computer equipment and medium
CN111160007B (en) Search method and device based on BERT language model, computer equipment and storage medium
CN111625621A (en) Document retrieval method and device, electronic equipment and storage medium
CN110888970A (en) Text generation method, device, terminal and storage medium
CN109614493B (en) Text abbreviation recognition method and system based on supervision word vector
CN111325018A (en) Domain dictionary construction method based on web retrieval and new word discovery
CN111008530A (en) Complex semantic recognition method based on document word segmentation
CN111429184A (en) User portrait extraction method based on text information
CN111090994A (en) Chinese-internet-forum-text-oriented event place attribution province identification method
CN112434134A (en) Search model training method and device, terminal equipment and storage medium
CN106570196B (en) Video program searching method and device
CN108595413B (en) Answer extraction method based on semantic dependency tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181120