CN110110328A - Text handling method and device - Google Patents

Text handling method and device Download PDF

Info

Publication number
CN110110328A
CN110110328A CN201910346113.XA CN201910346113A CN110110328A CN 110110328 A CN110110328 A CN 110110328A CN 201910346113 A CN201910346113 A CN 201910346113A CN 110110328 A CN110110328 A CN 110110328A
Authority
CN
China
Prior art keywords
word
text
word frequency
short
destination document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910346113.XA
Other languages
Chinese (zh)
Other versions
CN110110328B (en
Inventor
靳彦召
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zero Seconds Technology Co Ltd
Original Assignee
Beijing Zero Seconds Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zero Seconds Technology Co Ltd filed Critical Beijing Zero Seconds Technology Co Ltd
Priority to CN201910346113.XA priority Critical patent/CN110110328B/en
Publication of CN110110328A publication Critical patent/CN110110328A/en
Application granted granted Critical
Publication of CN110110328B publication Critical patent/CN110110328B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of text handling method and devices.This method includes obtaining short text corpus, disposes every short text according to preset format and using all short texts as a destination document;Count the word frequency summation of all words in the word frequency and the destination document that each word occurs in the destination document;According to the word frequency and the word frequency summation, the word weight of institute's predicate is calculated.Present application addresses the bad technical problems of short text treatment effect.The emphasis vocabulary in short text can be preferably identified by the application.In addition, the application is suitable for nature text-processing scene.

Description

Text handling method and device
Technical field
This application involves text-processing fields, in particular to a kind of text handling method and device.
Background technique
The characteristics of short text in natural language processing is that sentence is shorter, vocabulary is fewer.
Inventors have found that bad for short text treatment effect.Further, the heavy duty word in short text can not be identified It converges.
For the bad problem of short text treatment effect in the related technology, currently no effective solution has been proposed.
Summary of the invention
The main purpose of the application is to provide a kind of text handling method and device, to solve short text treatment effect not Good problem.
To achieve the goals above, according to the one aspect of the application, a kind of text handling method is provided.
Text handling method according to the application includes: to obtain short text corpus, disposes every short essay according to preset format Originally and using all short texts as a destination document;Count the word frequency and institute that each word occurs in the destination document State the word frequency summation of all words in destination document;According to the word frequency and the word frequency summation, the word power of institute's predicate is calculated Weight.
Further, the method is used to handle the weight of frequency of occurrences height but meaningless word in short text.
Further, short text corpus is obtained, disposes every short text according to preset format and by all short texts Include: to obtain short text corpus as a destination document, disposes every short essay according to the format that every row disposes a short text Originally and using all short texts as a destination document.
Further, all words in the word frequency and the destination document that each word occurs in the destination document are counted Word frequency summation includes: the word frequency WF that each word occurs in the statistics destination document;Count all words in the destination document Word frequency summation DF;According to the word frequency and the word frequency summation, the word weight that institute's predicate is calculated includes: to calculate word weight WW =ln (DF/WF).
Further, for handling that the frequency of occurrences in short text is high but meaningless word includes following one or more: language Gas word, auxiliary word, pronoun
To achieve the goals above, according to the another aspect of the application, a kind of text processing apparatus is provided.
According to the text processing apparatus of the application, comprising: module is obtained, for obtaining short text corpus, according to default lattice Formula disposes every short text and using all short text as a destination document;Statistical module, for counting the target The word frequency summation of all words in each word occurs in document word frequency and the destination document;Computing module, for according to institute Predicate frequency and the word frequency summation, are calculated the word weight of institute's predicate.
Further, for handling the weight of frequency of occurrences height but meaningless word in short text.
Further, the acquisition module disposes the format of a short text according to every row for obtaining short text corpus Dispose every short text and using all short texts as a destination document.
Further, the statistical module is used for, and counts the word frequency WF that each word occurs in the destination document;Statistics institute State the word frequency summation DF of all words in destination document;According to the word frequency and the word frequency summation, the word of institute's predicate is calculated Weight includes: to calculate word weight WW=ln (DF/WF).
Further, for handling that the frequency of occurrences in short text is high but meaningless word includes following one or more: language Gas word, auxiliary word, pronoun.
Text handling method and device in the embodiment of the present application, using short text corpus is obtained, according to preset format portion Affix one's name to every short text and using all short texts as the mode of a destination document, it is every in the destination document by counting The word frequency summation of all words, has reached according to the word frequency and the word frequency in the word frequency of a word appearance and the destination document Summation, is calculated the purpose of the word weight of institute's predicate, to realize the emphasis vocabulary that can preferably identify in short text Technical effect, and then solve the bad technical problem of short text treatment effect.
Detailed description of the invention
The attached drawing constituted part of this application is used to provide further understanding of the present application, so that the application's is other Feature, objects and advantages become more apparent upon.The illustrative examples attached drawing and its explanation of the application is for explaining the application, not Constitute the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the text handling method flow diagram according to one embodiment of the application;
Fig. 2 is the text handling method flow diagram according to another embodiment of the application;
Fig. 3 is the text processing apparatus structural schematic diagram according to the embodiment of the present application.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only The embodiment of the application a part, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people Member's every other embodiment obtained without making creative work, all should belong to the model of the application protection It encloses.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to embodiments herein described herein.In addition, term " includes " and " tool Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing a series of steps or units Process, method, system, product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include without clear Other step or units listing to Chu or intrinsic for these process, methods, product or equipment.
In this application, term " on ", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outside", " in ", "vertical", "horizontal", " transverse direction ", the orientation or positional relationship of the instructions such as " longitudinal direction " be orientation based on the figure or Positional relationship.These terms are not intended to limit indicated dress primarily to better describe the application and embodiment Set, element or component must have particular orientation, or constructed and operated with particular orientation.
Also, above-mentioned part term is other than it can be used to indicate that orientation or positional relationship, it is also possible to for indicating it His meaning, such as term " on " also are likely used for indicating certain relations of dependence or connection relationship in some cases.For ability For the those of ordinary skill of domain, the concrete meaning of these terms in this application can be understood as the case may be.
In addition, term " installation ", " setting ", " being equipped with ", " connection ", " connected ", " socket " shall be understood in a broad sense.For example, It may be a fixed connection, be detachably connected or monolithic construction;It can be mechanical connection, or electrical connection;It can be direct phase It even, or indirectly connected through an intermediary, or is two connections internal between device, element or component. For those of ordinary skills, the concrete meaning of above-mentioned term in this application can be understood as the case may be.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
As shown in Figure 1, this method includes the following steps, namely S102 to step S106:
Step S102 obtains short text corpus, disposes every short text according to preset format and by all short texts As a destination document;
Short text corpus is acquired as text input.Short text corpus can be collected in advance.
Refer to according to preset format and disposes the short text corpus to each short text according to the format of setting. Meanwhile using all short texts as a destination document.
It should be noted that using all short texts as can't be by each short text when a destination document It is individually handled, but it is that a text is handled that all short texts, which are treated as,.
Step S104 counts all words in the word frequency and the destination document that each word occurs in the destination document Word frequency summation;
The word frequency that all words in the word frequency and the destination document that each word occurs are counted in the destination document is total With.
It should be noted that the word frequency not occurred to word each in the destination document in embodiments herein Statistical method is specifically limited, as long as being capable of word frequency statistics demand.
It is also to be noted that not to the word frequency summation of all words in the destination document in embodiments herein Statistical method specifically limited, as long as being capable of word frequency statistics summation demand.
The word weight of institute's predicate is calculated according to the word frequency and the word frequency summation in step S106.
According to the word frequency and the word frequency summation, to calculate the word weight of institute's predicate.According to obtained institute's predicate Word weight of the weight as keyword in short text.
It can be seen from the above description that the application realizes following technical effect:
In the embodiment of the present application, using short text corpus is obtained, every short text is disposed according to preset format and by institute Have mode of the short text as a destination document, by count each word occurs in the destination document word frequency and The word frequency summation of all words in the destination document, has reached according to the word frequency and the word frequency summation, is calculated described The purpose of the word weight of word to realize the technical effect that can preferably identify the emphasis vocabulary in short text, and then solves It has determined the bad technical problem of short text treatment effect.
According to the embodiment of the present application, as preferred in the present embodiment, for handling, the frequency of occurrences in short text is high but nothing The weight of meaning word.In embodiments herein, the concept of number of files is not used, by using word frequency summation and word frequency The method that ratio takes natural logrithm again, can effectively solve the problems, such as some high frequencies but meaningless word weight ratio is higher.
According to the embodiment of the present application, as preferred in the present embodiment, short text corpus is obtained, is disposed according to preset format Every short text and using all short texts as a destination document include: obtain short text corpus, according to every row dispose The format of one short text disposes every short text and using all short text as a destination document.Specifically, it will obtain The short text corpus merger taken is a document, and has a short text in every row.It is segmented again later.
According to the embodiment of the present application, as preferred in the present embodiment, count what each word in the destination document occurred The word frequency summation of all words includes: in word frequency and the destination document
Step S202 counts the word frequency WF that each word occurs in the destination document;
Step S204 counts the word frequency summation DF of all words in the destination document;
Step S206, according to the word frequency and the word frequency summation, the word weight that institute's predicate is calculated includes: calculating word Weight WW=ln (DF/WF).
Specifically, pass through the word frequency summation DF, word weight WW=of all words in the word frequency WF and document of each word of statistics ln(DF/WF).The method for taking natural logrithm again using the ratio of word frequency summation and word frequency calculates word weight at this time.
According to the embodiment of the present application, as preferred in the present embodiment, for handling, the frequency of occurrences in short text is high but nothing Meaning word includes following one or more: modal particle, auxiliary word, pronoun.
Specifically, due to the characteristics of short text be sentence is shorter, vocabulary is fewer, a word can in current statement Can only occur once, however be difficult to find which word or which word are emphasis in traditional word statistics based on long text Word.In this application based on the thought of TFIDF, algorithm and thinking are transformed, make the word weight processing suitable for short text Method.By regarding the short text corpus of all collections a piece of document as, having cast aside existing in embodiments herein The concept of number of documents in TFIDF, eliminates the process for calculating TF, and calculation amount is smaller.IDF means inverse text frequency in TFIDF Rate index refers to that total number of files and some word appear in a calculated result in how many documents.Do not have in this method The concept of number of files, the method for taking natural logrithm again using the ratio of word frequency summation and word frequency, for example, when only occurring one in document A word " ", word frequency WF is equivalent to the word frequency summation DF of all words, then WW=ln (DF/WF)=ln1=0.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructions It is executed in computer system, although also, logical order is shown in flow charts, and it in some cases, can be with not The sequence being same as herein executes shown or described step.
According to the embodiment of the present application, additionally provide it is a kind of for implementing the text processing apparatus of the above method, such as Fig. 3 institute Show, which includes: to obtain module 10, for obtaining short text corpus, disposes every short text according to preset format and by institute There is the short text as a destination document;Statistical module 20, for counting the word that each word occurs in the destination document The word frequency summation of all words in frequency and the destination document;Computing module 30, for total according to the word frequency and the word frequency With the word weight of institute's predicate is calculated.
Short text corpus is acquired in the acquisition module 10 of the embodiment of the present application as text input.It can collect in advance Short text corpus.
Refer to according to preset format and disposes the short text corpus to each short text according to the format of setting. Meanwhile using all short texts as a destination document.
It should be noted that using all short texts as can't be by each short text when a destination document It is individually handled, but it is that a text is handled that all short texts, which are treated as,.
The word frequency and institute that each word occurs are counted in the statistical module 20 of the embodiment of the present application in the destination document State the word frequency summation of all words in destination document.
It should be noted that the word frequency not occurred to word each in the destination document in embodiments herein Statistical method is specifically limited, as long as being capable of word frequency statistics demand.
It is also to be noted that not to the word frequency summation of all words in the destination document in embodiments herein Statistical method specifically limited, as long as being capable of word frequency statistics summation demand.
According to the word frequency and the word frequency summation in the computing module 30 of the embodiment of the present application, to calculate institute's predicate Word weight.Word weight according to obtained institute's predicate weight as keyword in short text.
According to the embodiment of the present application, as preferred in the present embodiment, the text processing apparatus is for handling short text The weight of middle frequency of occurrences height but meaningless word.In embodiments herein, the concept of number of files is not used, by using The method that the ratio of word frequency summation and word frequency takes natural logrithm again can effectively solve some high frequencies but meaningless word weight Relatively high problem.
According to the embodiment of the present application, as preferred in the present embodiment, the acquisition module 10, for obtaining short text language Material disposes every short text and using all short texts as a target text according to the format that every row disposes a short text Shelves.Specifically, the short text corpus merger that will acquire is a document, and has a short text in every row.It carries out again later Participle.
According to the embodiment of the present application, as preferred in the present embodiment, the statistical module is used for,
Count the word frequency WF that each word occurs in the destination document;
Count the word frequency summation DF of all words in the destination document;
According to the word frequency and the word frequency summation, the word weight that institute's predicate is calculated includes:
It calculates word weight WW=ln (DF/WF).
Specifically, pass through the word frequency summation DF, word weight WW=of all words in the word frequency WF and document of each word of statistics ln(DF/WF).The method for taking natural logrithm again using the ratio of word frequency summation and word frequency calculates word weight at this time.
According to the embodiment of the present application, as preferred in the present embodiment, the text processing apparatus is for handling short text The middle frequency of occurrences is high but meaningless word includes following one or more: modal particle, auxiliary word, pronoun.Specifically, due to short text The characteristics of be sentence is shorter, vocabulary is fewer, a word may only occur in current statement it is primary, however traditional Word statistics based on long text is difficult to find which word or which word are heavy duty words.In this application based on the think of of TFIDF Think, algorithm and thinking are transformed, makes the word weight processing method suitable for short text.Pass through in embodiments herein By the short text corpus of all collections, regard a piece of document as, has cast aside the concept of number of documents in existing TFIDF, eliminated The process of TF is calculated, calculation amount is smaller.IDF means inverse document frequency in TFIDF, refers to total number of files and some Word appears in a calculated result in how many documents.There is no the concept of number of files in this method, using word frequency summation and word The method that the ratio of frequency takes natural logrithm again, for example, only occur in the document word " ", word frequency WF is equivalent to all words Word frequency summation DF, then WW=ln (DF/WF)=ln1=0.
Obviously, those skilled in the art should be understood that each module of above-mentioned the application or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored Be performed by computing device in the storage device, perhaps they are fabricated to each integrated circuit modules or by they In multiple modules or step be fabricated to single integrated circuit module to realize.In this way, the application be not limited to it is any specific Hardware and software combines.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.

Claims (10)

1. a kind of text handling method characterized by comprising
Short text corpus is obtained, disposes every short text according to preset format and using all short texts as a target text Shelves;
Count the word frequency summation of all words in the word frequency and the destination document that each word occurs in the destination document;
According to the word frequency and the word frequency summation, the word weight of institute's predicate is calculated.
2. text handling method according to claim 1, which is characterized in that for handle in short text the frequency of occurrences it is high but The weight of meaningless word.
3. text handling method according to claim 1, which is characterized in that short text corpus is obtained, according to preset format It disposes every short text and includes: using all short texts as a destination document
Short text corpus is obtained, disposes every short text according to the format that every row disposes a short text and by all short essays This is as a destination document.
4. text handling method according to claim 1, which is characterized in that count each word in the destination document and occur Word frequency and the destination document in the word frequency summations of all words include:
Count the word frequency WF that each word occurs in the destination document;
Count the word frequency summation DF of all words in the destination document;
According to the word frequency and the word frequency summation, the word weight that institute's predicate is calculated includes:
It calculates word weight WW=ln (DF/WF).
5. text handling method according to claim 1, which is characterized in that for handle in short text the frequency of occurrences it is high but Meaningless word includes following one or more: modal particle, auxiliary word, pronoun.
6. a kind of text processing apparatus characterized by comprising
Module is obtained, for obtaining short text corpus, disposes every short text according to preset format and by all short texts As a destination document;
Statistical module, for counting all words in the word frequency and the destination document that each word occurs in the destination document Word frequency summation;
Computing module, for the word weight of institute's predicate to be calculated according to the word frequency and the word frequency summation.
7. text processing apparatus according to claim 6, which is characterized in that for handle in short text the frequency of occurrences it is high but The weight of meaningless word.
8. text processing apparatus according to claim 6, which is characterized in that the acquisition module, for obtaining short text Corpus disposes every short text and using all short texts as a target according to the format that every row disposes a short text Document.
9. text processing apparatus according to claim 6, which is characterized in that the statistical module is used for,
Count the word frequency WF that each word occurs in the destination document;
Count the word frequency summation DF of all words in the destination document;
According to the word frequency and the word frequency summation, the word weight that institute's predicate is calculated includes:
It calculates word weight WW=ln (DF/WF).
10. text processing apparatus according to claim 6, which is characterized in that high for handling the frequency of occurrences in short text But meaningless word includes following one or more: modal particle, auxiliary word, pronoun.
CN201910346113.XA 2019-04-26 2019-04-26 Text processing method and device Active CN110110328B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910346113.XA CN110110328B (en) 2019-04-26 2019-04-26 Text processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910346113.XA CN110110328B (en) 2019-04-26 2019-04-26 Text processing method and device

Publications (2)

Publication Number Publication Date
CN110110328A true CN110110328A (en) 2019-08-09
CN110110328B CN110110328B (en) 2023-09-01

Family

ID=67487015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910346113.XA Active CN110110328B (en) 2019-04-26 2019-04-26 Text processing method and device

Country Status (1)

Country Link
CN (1) CN110110328B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251841A (en) * 2007-05-17 2008-08-27 华东师范大学 Method for establishing and searching feature matrix of Web document based on semantics
CN104750844A (en) * 2015-04-09 2015-07-01 中南大学 Method and device for generating text characteristic vectors based on TF-IGM, method and device for classifying texts
CN106503153A (en) * 2016-10-21 2017-03-15 江苏理工学院 A kind of computer version taxonomic hierarchies, system and its file classification method
CN106570112A (en) * 2016-11-01 2017-04-19 四川用联信息技术有限公司 Improved ant colony algorithm-based text clustering realization method
CN106919554A (en) * 2016-10-27 2017-07-04 阿里巴巴集团控股有限公司 The recognition methods of invalid word and device in document
CN108491429A (en) * 2018-02-09 2018-09-04 湖北工业大学 A kind of feature selection approach based on document frequency and word frequency statistics between class in class
CN108536868A (en) * 2018-04-24 2018-09-14 北京慧闻科技发展有限公司 The data processing method of short text data and application on social networks
CN109492110A (en) * 2018-11-28 2019-03-19 南京中孚信息技术有限公司 Document Classification Method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251841A (en) * 2007-05-17 2008-08-27 华东师范大学 Method for establishing and searching feature matrix of Web document based on semantics
CN104750844A (en) * 2015-04-09 2015-07-01 中南大学 Method and device for generating text characteristic vectors based on TF-IGM, method and device for classifying texts
CN106503153A (en) * 2016-10-21 2017-03-15 江苏理工学院 A kind of computer version taxonomic hierarchies, system and its file classification method
CN106919554A (en) * 2016-10-27 2017-07-04 阿里巴巴集团控股有限公司 The recognition methods of invalid word and device in document
CN106570112A (en) * 2016-11-01 2017-04-19 四川用联信息技术有限公司 Improved ant colony algorithm-based text clustering realization method
CN108491429A (en) * 2018-02-09 2018-09-04 湖北工业大学 A kind of feature selection approach based on document frequency and word frequency statistics between class in class
CN108536868A (en) * 2018-04-24 2018-09-14 北京慧闻科技发展有限公司 The data processing method of short text data and application on social networks
CN109492110A (en) * 2018-11-28 2019-03-19 南京中孚信息技术有限公司 Document Classification Method and device

Also Published As

Publication number Publication date
CN110110328B (en) 2023-09-01

Similar Documents

Publication Publication Date Title
US20090125371A1 (en) Domain-Specific Sentiment Classification
CN104102681B (en) Microblog key event acquiring method and device
EP2657852A1 (en) Method and device for filtering harmful information
CN103823796A (en) System and method for translation
KR20140093762A (en) Method, apparatus, and computer storage medium for automatically adding tags to document
CN104504046A (en) Patent retrieval system and patent retrieval method
CN103678714B (en) Construction method and device for entity knowledge base
CN103106245A (en) Method which is used for classifying translation manuscript in automatic fragmentation mode and based on large-scale term corpus
Mao et al. Parameterization of the level-resolved radiative recombination rate coefficients for the SPEX code
US9870433B2 (en) Data processing method and system of establishing input recommendation
CN105512104A (en) Dictionary dimension reducing method and device and information classifying method and device
Levenberg et al. Stream-based randomised language models for SMT
CN106126495B (en) One kind being based on large-scale corpus prompter method and apparatus
CN110110328A (en) Text handling method and device
Karan et al. Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian.
CN102915312A (en) Method and system for issuing information on websites
CN102291440A (en) Method and device for optimizing rule in cloud environment
Lemnitzer et al. Combining a rule-based approach and machine learning in a good-example extraction task for the purpose of lexicographic work on contemporary standard German
Cassan et al. Bayesian analysis of caustic-crossing microlensing events
JP5798086B2 (en) Device, method and program for extracting pairs of place names and words from a document
CN106156033A (en) A kind of search engine optimization SEO page generation method and equipment
Volk How bad is the problem of PP-attachment? A comparison of English, German and Swedish
Nabil et al. New approaches for extracting arabic keyphrases
CN105512339A (en) File searcher and searching method
CN112560448A (en) New word extraction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant