CN101079025B - File correlation computing system and method - Google Patents

File correlation computing system and method Download PDF

Info

Publication number
CN101079025B
CN101079025B CN2006100360943A CN200610036094A CN101079025B CN 101079025 B CN101079025 B CN 101079025B CN 2006100360943 A CN2006100360943 A CN 2006100360943A CN 200610036094 A CN200610036094 A CN 200610036094A CN 101079025 B CN101079025 B CN 101079025B
Authority
CN
China
Prior art keywords
justice
vocabulary
module
document
semantic vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2006100360943A
Other languages
Chinese (zh)
Other versions
CN101079025A (en
Inventor
丁江伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN2006100360943A priority Critical patent/CN101079025B/en
Publication of CN101079025A publication Critical patent/CN101079025A/en
Application granted granted Critical
Publication of CN101079025B publication Critical patent/CN101079025B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a document related degree calculating system, which is characterized by the following: comprising sequence of document pretreating module and dividing vocabulary module; setting the output of the document pretreating module as at least one pre-analyzing document; setting the output of the dividing vocabulary module as relative first vocabulary meter; also comprising aryumentation element processing module and document related degree calculating module; converting the vocabulary of the first vocabulary mater to aryumentation element; calculating the weight of the aryumentation element; getting at least one theme semantic vector with relative to at least one document; connecting the document relative degree calculating module to the theme semantic vector calculating module; using to calculate related degree of at least two theme semantic vector. This invention also discloses a document related degree calculating method. This invention can remove vocabulary rarefaction and ambiguous vocabulary phenomenon to improve the calculating accuracy of document related degree.

Description

A kind of file correlation computing system and method
Technical field
The present invention relates to the network communications technology, more particularly, relate to a kind of file correlation computing system and method.
Background technology
File correlation is the decimal between 0 to 1, has characterized degree of correlation semantically between two pieces of documents.For example, the degree of correlation of two pieces of identical document is 1, and the degree of correlation of one piece of document that relates to programming technique and one piece of document that relates to political society much smaller than 1, approach 0.Calculate file correlation and can be applied in a lot of aspects, such as the taxonomic clustering of document, retrieval related article information etc.
The calculating of file correlation at present all is based on theme vocabulary extractive technique: at first extract the theme vocabulary of document to be compared by calculating, draw the degree of correlation of document to be compared again by the degree of correlation between the calculating theme vocabulary.
Existing theme extracting method mainly contains two kinds.A kind of theme extractive technique that is based on title.Its method is: adopt document resolver, parse documents is found out the title of document, then with the value of title in the document theme as document.But this computing method are obviously too simple, can't be applied in to calculate in the file correlation.
Another is based on the theme extractive technique of word frequency.Along with the development of statistics natural language processing technique, the method that indicates document subject matter according to high word frequency keyword has obtained using widely, and is particularly more commonly used in the extraction of Web page subject.Specific practice is earlier the webpage source file to be removed the tag mark, then article content is carried out participle statistics word frequency, by word frequency keyword is sorted at last, provides the high word frequency of top n as the article theme.But because the table semantic language develops very fully, one adopted many speech, polysemy are universal phenomena, the utilization of adding the rhetoric method makes the sparse phenomenon outwardness of vocabulary, particularly for the short essay chapter of web page class, this algorithm whole structure is not very desirable, thereby causes the calculating of file correlation undesirable.
Summary of the invention
The objective of the invention is to the defective at prior art, a kind of file correlation computing system and method are provided, it can eliminate the negative influence to the degree of correlation of polysemant and the sparse phenomenon of vocabulary simultaneously based on justice unit collection semantic analysis technology.
Technical scheme of the present invention is: a kind of file correlation computing system, comprise document pretreatment module and word-dividing mode that order links to each other, described document pretreatment module be input as at least one piece of writing document to be analyzed, described word-dividing mode is output as first vocabulary of corresponding described at least one piece of document; Described word-dividing mode also has the function of the vocabulary after the cutting being carried out part-of-speech tagging; Also comprise: the participle post-processing module of between described word-dividing mode and adopted first processing module, also connecting, described participle post-processing module is used for the part of speech according to the first vocabulary vocabulary, rejects wherein stop words, function word, obtains second vocabulary; The first processing module of justice, be used for the vocabulary of described second vocabulary is carried out justice unit mark, form the 3rd vocabulary, determine the weights of the pairing a plurality of justice of polysemant in described the 3rd vocabulary unit or determine unique justice unit for polysemant and show to obtain the first justice unit, weight is calculated by all justice units in the described first justice unit table, obtain theme semantic vector by the weight ordering; The file correlation computing module, it links to each other with the first processing module of described justice, is used at least two theme semantic vectors are carried out relatedness computation.
Wherein, described document pretreatment module is used for document input, different-format is converted to standard format, and extracts the document text; Word-dividing mode is used for participle is carried out in the output of described document pretreatment module, obtains described first vocabulary.The first processing module of described justice comprises: adopted first labeling module, and it is used for using adopted first dictionary that the vocabulary of described second vocabulary is carried out justice unit mark, forms the 3rd vocabulary; The word sense disambiguation module, it is used for determining the first weight of the pairing a plurality of justice of described the 3rd vocabulary polysemant, or determines that for polysemant unique justice is first, obtains the first justice unit and shows; Theme semantic vector computing module, it is used for weight is calculated by all justice units of the described first justice unit table, obtains the theme semantic vector by the weight ordering.
As an improvement of the present invention, also comprise theme semantic vector storehouse, its input end links to each other with the first processing module of described justice, and output terminal links to each other with described file correlation computing module, is used to store the theme semantic vector of the first processing module output of described justice.Described file correlation computing module is used at least two theme semantic vectors are carried out relatedness computation; Described theme semantic vector obtains from the first processing module of described justice, or obtains from described theme semantic vector storehouse, or obtains from first processing module of described justice and theme semantic vector storehouse respectively.
The present invention also provides a kind of file correlation computing method, may further comprise the steps: (a), be converted to standard format by document pretreatment module document that will import, different-format, and extract the document body matter; (b), the output of described document pretreatment module is carried out participle and the vocabulary after the cutting is carried out part-of-speech tagging, obtain first vocabulary by word-dividing mode; Stop words, the function word processing of the vocabulary of described first vocabulary being rejected wherein by the participle post-processing module obtain second vocabulary; (c), by the first processing module of justice the vocabulary in described second vocabulary is carried out justice unit mark, form the 3rd vocabulary, and the vocabulary in described the 3rd vocabulary handled, determine the weights of the pairing a plurality of justice of polysemant wherein unit or determine unique justice unit for polysemant and show to obtain the first justice unit, weight is calculated by all justice units in the described first justice unit table, obtain theme semantic vector by the weight ordering; (d), calculate, obtain the degree of correlation of described at least two pieces of documents by the theme semantic vector of file correlation computing module at least two pieces of documents to be analyzed.
Wherein, in the step (d), the theme semantic vector of described at least two pieces of documents obtains from the first processing module of described justice, or from theme semantic vector storehouse that described file correlation computing module is connected obtain, or from first processing module of described justice and theme semantic vector storehouse, obtain respectively.
Further, step (a) further comprises: described document pretreatment module is obtained corresponding document classification information and heading message.
In the step (c), the method that obtains the theme semantic vector is: (c1), use adopted first dictionary that the vocabulary in described second vocabulary is carried out justice unit mark by the first labeling module of justice, form the 3rd vocabulary; (c2), handled marking the first vocabulary of justice in described the 3rd vocabulary, determined the wherein first weights of the pairing a plurality of justice of polysemant, or determined for polysemant that unique justice was first, and obtained the first justice unit and show by the word sense disambiguation module; (c3), weight is calculated by all the justice units in the described first justice unit table, obtain theme semantic vector by the weight ordering by theme semantic vector computing module.
Beneficial effect of the present invention is: 1. adopt the semantic analysis technology based on justice unit collection, avoided the sparse difficult problem of vocabulary, make that the analytical effect of the degree of correlation is good between the document that relates to the short essay chapter, promoted the precision that file correlation calculates.2. use the word sense disambiguation technology, eliminated polysemant, improved the relatedness computation precision calculating the negative influence of the degree of correlation.3. take into full account the presorting of document, heading message and display properties, can extract the theme of document exactly, thereby promoted the file correlation computational accuracy.
Description of drawings
Fig. 1 is the structural drawing of a kind of file correlation computing system of the present invention.
Fig. 2 is the process flow diagram of a kind of file correlation computing method of the present invention.
Embodiment
The present invention is further elaborated with preferred embodiment with reference to the accompanying drawings below.
As shown in Figure 1, a kind of file correlation computing system of the present invention comprises document pretreatment module 1, word-dividing mode 2, participle post-processing module 3, adopted first processing module and the file correlation computing module 8 that links to each other in turn.The first processing module of justice comprises adopted first labeling module 4, word sense disambiguation module 5 and the theme semantic vector computing module 6 that links to each other in turn.As required, can also comprise theme semantic vector storehouse 7, its input end links to each other with theme semantic vector computing module 6, and its output terminal links to each other with file correlation computing module 8.
Wherein, document pretreatment module 1 is used for document input, different-format is converted to standard format, and extracts the document text.Wherein, the document of different-format can comprise documents such as webpage, word document, text document, pdf.Standard format can be a text document.In specific implementation, if can from the standard format after the conversion, extract Document Title and classified information, then the document pretreatment module can also have the ability of extracting conversion back standard document title and classified information, with the accuracy of raising extraction document subject matter, thereby promote the file correlation computational accuracy.All documents as system handles all are webpage formats, and then standard format is defined as webpage format, and the document pretreatment module just need possess the ability of extracting web page title and classified information.It links to each other with word-dividing mode 2.
Word-dividing mode 2 is used for participle is carried out in the output of described document pretreatment module 1.In the present embodiment, word-dividing mode 2 is responsible for will changing afterwards according to dictionary, and text and title, the classification of webpage are cut into vocabulary.As " I am a student " carried out participle, be divided into " I ", "Yes", " one ", " student " four speech.Word algorithm can be divided into three major types in existing minute: based on the segmenting method of string matching, based on the segmenting method of understanding with based on the segmenting method of adding up.Adopt segmenting method in the present embodiment based on string matching.This method is called mechanical segmentation method again, and it is according to certain strategy the entry in Chinese character string to be analyzed and one " fully big " machine dictionary to be mated, if find certain character string in dictionary, then the match is successful (identifying a speech).
Among the present invention, word-dividing mode 2 also has the function of the vocabulary after the cutting being carried out part-of-speech tagging, stop words in the vocabulary and function word etc. is removed according to part of speech to make things convenient for participle post-processing module 3.
The function of participle post-processing module 3 includes but not limited to the output of word-dividing mode 2 is removed stop words, gone function such as function word, rejects the irrelevant information of theme.
Vocabulary after the first labeling module 4 of justice is used to use adopted first dictionary to participle is to justice unit mark.It links to each other with participle post-processing module 3, word sense disambiguation module 5 and adopted first dictionary.
Because it is first that adopted first dictionary has provided a plurality of justice to polysemant, at this moment just need word sense disambiguation module 5 to determine possible weight of each justice unit of this polysemant correspondence according to upper and lower civilian information.Can certainly adopt simpler method: based on context determine the semanteme of a specific justice unit as this polysemant.Present embodiment adopts second method.Can adopt methods such as bayesian algorithm, decision tree, computing information entropy to calculate.
The present invention in the leaching process of document subject matter be not with vocabulary as computing unit, and be to use adopted first dictionary to convert vocabulary the expression of to justice unit, be a kind of semantic analysis technology based on justice unit collection.So-called justice unit (semantic primitives) is exactly to organize element the most basic in the semantic language.It can be construed to: the symbol of one group of meaning, in addition, other all vocabulary can both define with them.The great difficult problem that natural language processing faced is that vocabulary is sparse, so converting keyword to adopted first vocabulary shows and can avoid the sparse phenomenon of vocabulary to a great extent, justice unit collection is a vocabulary or an adopted first sequence number set on a small scale, natural all notions have been characterized, plain unique, the notion of unduplicated expression of the first element of set of justice.
Theme semantic vector computing module 6 is used to utilize the Statistical Linguistics principle that all adopted units of word sense disambiguation module 5 outputs are calculated, and result calculated is to have given different weights to different adopted units, obtains the theme semantic vector by the weight ordering.If document pretreatment module 1 has obtained the title and the classified information of document, then theme semantic vector computing module 6 classified information, heading message and text message for document in calculation process gives different degneracies respectively.
In the present embodiment, adopt the Tf-Idf algorithm that weight is calculated by all justice units.Can certainly adopt the cross entropy scheduling algorithm that weight is calculated by justice unit.The Tf-Idf algorithm adopts the inverted index technology, and it is mainly used in full-text search.This algorithm can guarantee to compose with high weight into the justice unit of intermediate frequency, and gets rid of noise vocabulary.
Theme semantic vector storehouse 7 is used to store the theme semantic vector of theme semantic vector computing module 6 outputs.
File correlation computing module 8 is used for the theme semantic vector of at least two pieces of documents to be analyzed is calculated, and obtains the degree of correlation of described at least two pieces of documents.Above-mentioned theme semantic vector can all obtain from the first processing module of justice, and promptly aforementioned each module is handled at least two pieces of documents respectively simultaneously.Above-mentioned theme semantic vector also can all obtain from described theme semantic vector storehouse 7, and it searches out the theme semantic vector corresponding with document to be analyzed according to setting from theme semantic vector storehouse 7, calculate then.Above-mentioned theme semantic vector also can one piece obtains from the first processing module of justice, and another piece of writing obtains from theme semantic vector storehouse 7.For example administration module is found in two pieces of documents to be analyzed one piece by analysis, and its theme semantic vector is stored in the theme semantic vector storehouse 7, then this analyzes one piece of document, and the theme semantic vector of another piece document directly obtains from theme semantic vector storehouse 7.
Can be by calculating the degree of correlation that two included angle cosines between the theme semantic vector obtain relevant documentation.
As shown in Figure 2, a kind of file correlation computing method of the present invention comprise step:
S1, document pretreatment module 1 document that will import, different-format is converted to standard format documentation, extracts its body matter, if can reentry its title and classification information.
S2, the body matter of 2 pairs of documents of word-dividing mode (may also comprise classification and title) carries out participle, and the vocabulary after the above-mentioned cutting is carried out part-of-speech tagging, forms first vocabulary.
S3, participle post-processing module 3 forms second vocabulary with rejectings such as the stop words in first vocabulary, function words.
S4, adopted first labeling module 4 marks the vocabulary in second vocabulary according to the corresponding relation of dictionary and adopted first dictionary with adopted unit, form the 3rd vocabulary.
S5, the polysemant in 5 pairs the 3rd vocabularies of word sense disambiguation module is handled, and based on contextual information is that polysemant determines that corresponding unique justice is first, obtains the first justice unit table.
S6, theme semantic vector computing module 6 Tf-Idf according to the quantity space model (characteristic item tax weight factor) scheduling algorithm calculate weight for all the justice units in the first justice unit table, obtain the theme semantic vector by the weight ordering.
S7,8 pairs of file correlation computing modules calculate with the corresponding theme semantic vector of document to be analyzed, obtain the degree of correlation between the document to be analyzed, and it is normalized to numerical value between the 0-1.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within the claim scope of the present invention.

Claims (8)

1. file correlation computing system, comprise document pretreatment module and word-dividing mode that order links to each other, described document pretreatment module be input as at least one piece of writing document to be analyzed, described word-dividing mode is output as first vocabulary of corresponding described at least one piece of document; Described word-dividing mode also has the function of the vocabulary after the cutting being carried out part-of-speech tagging; It is characterized in that, also comprise:
The participle post-processing module of between described word-dividing mode and adopted first processing module, also connecting, described participle post-processing module is used for the part of speech according to the first vocabulary vocabulary, rejects wherein stop words, function word, obtains second vocabulary;
The first processing module of justice, be used for the vocabulary of described second vocabulary is carried out justice unit mark, form the 3rd vocabulary, determine the weights of the pairing a plurality of justice of polysemant in described the 3rd vocabulary unit or determine unique justice unit for polysemant and show to obtain the first justice unit, weight is calculated by all justice units in the described first justice unit table, obtain theme semantic vector by the weight ordering;
The file correlation computing module, it links to each other with the first processing module of described justice, is used at least two theme semantic vectors are carried out relatedness computation.
2. a kind of file correlation computing system according to claim 1, it is characterized in that: also comprise theme semantic vector storehouse, its input end links to each other with the first processing module of described justice, output terminal links to each other with described file correlation computing module, is used to store the theme semantic vector of the first processing module output of described justice;
Described file correlation computing module is used at least two theme semantic vectors are carried out relatedness computation; Described theme semantic vector obtains from the first processing module of described justice, or obtains from described theme semantic vector storehouse, or obtains from first processing module of described justice and theme semantic vector storehouse respectively.
3. a kind of file correlation computing system according to claim 1 is characterized in that:
Described document pretreatment module is used for document input, different-format is converted to standard format, and extracts the document text;
Word-dividing mode is used for participle is carried out in the output of described document pretreatment module, obtains described first vocabulary.
4. a kind of file correlation computing system according to claim 3 is characterized in that, the first processing module of described justice comprises:
The first labeling module of justice, it is used for using adopted first dictionary that the vocabulary of described second vocabulary is carried out justice unit mark, forms the 3rd vocabulary;
The word sense disambiguation module, it is used for determining the first weight of the pairing a plurality of justice of described the 3rd vocabulary polysemant, or determines that for polysemant unique justice is first, obtains the first justice unit and shows;
Theme semantic vector computing module, it is used for weight is calculated by all justice units of the described first justice unit table, obtains the theme semantic vector by the weight ordering.
5. file correlation computing method is characterized in that, may further comprise the steps:
(a), be converted to standard format, and extract the document body matter by document pretreatment module document that will import, different-format;
(b), the output of described document pretreatment module is carried out participle and the vocabulary after the cutting is carried out part-of-speech tagging, obtain first vocabulary by word-dividing mode; Stop words, the function word processing of the vocabulary of described first vocabulary being rejected wherein by the participle post-processing module obtain second vocabulary;
(c), by the first processing module of justice the vocabulary in described second vocabulary is carried out justice unit mark, form the 3rd vocabulary, and the vocabulary in described the 3rd vocabulary handled, determine the weights of the pairing a plurality of justice of polysemant wherein unit or determine unique justice unit for polysemant and show to obtain the first justice unit, weight is calculated by all justice units in the described first justice unit table, obtain theme semantic vector by the weight ordering;
(d), calculate, obtain the degree of correlation of described at least two pieces of documents by the theme semantic vector of file correlation computing module at least two pieces of documents to be analyzed.
6. a kind of file correlation computing method according to claim 5, it is characterized in that: in the step (d), the theme semantic vector of described at least two pieces of documents obtains from the first processing module of described justice, or from theme semantic vector storehouse that described file correlation computing module is connected obtain, or from first processing module of described justice and theme semantic vector storehouse, obtain respectively.
7. a kind of file correlation computing method according to claim 5 is characterized in that, step (a) further comprises: described document pretreatment module is obtained corresponding document classification information and heading message.
8. a kind of file correlation computing method according to claim 5 is characterized in that, in the step (c), the method that obtains the theme semantic vector is:
(c1), use adopted first dictionary that the vocabulary in described second vocabulary is carried out justice unit mark, form the 3rd vocabulary by the first labeling module of justice;
(c2), handled marking the first vocabulary of justice in described the 3rd vocabulary, determined the wherein first weights of the pairing a plurality of justice of polysemant, or determined for polysemant that unique justice was first, and obtained the first justice unit and show by the word sense disambiguation module;
(c3), weight is calculated by all the justice units in the described first justice unit table, obtain theme semantic vector by the weight ordering by theme semantic vector computing module.
CN2006100360943A 2006-06-19 2006-06-19 File correlation computing system and method Active CN101079025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2006100360943A CN101079025B (en) 2006-06-19 2006-06-19 File correlation computing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2006100360943A CN101079025B (en) 2006-06-19 2006-06-19 File correlation computing system and method

Publications (2)

Publication Number Publication Date
CN101079025A CN101079025A (en) 2007-11-28
CN101079025B true CN101079025B (en) 2010-06-16

Family

ID=38906505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2006100360943A Active CN101079025B (en) 2006-06-19 2006-06-19 File correlation computing system and method

Country Status (1)

Country Link
CN (1) CN101079025B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100114887A1 (en) * 2008-10-17 2010-05-06 Google Inc. Textual Disambiguation Using Social Connections
CN102073552B (en) * 2009-11-19 2013-01-16 北大方正集团有限公司 Digital resource packet structure verifying method and system
CN101833582A (en) * 2010-05-04 2010-09-15 吴毓杰 Mining method and system for correlation of vocabulary entities based on template
CN107122980B (en) * 2011-01-25 2021-08-27 阿里巴巴集团控股有限公司 Method and device for identifying categories to which commodities belong
CN102867048B (en) * 2012-09-08 2015-02-18 苏州大学 Document storing method based on semantic compression
CN103106192B (en) * 2013-02-02 2016-02-03 深圳先进技术研究院 Literary work writer identification method and device
CN103092828B (en) * 2013-02-06 2015-08-12 杭州电子科技大学 Based on the text similarity measure of semantic analysis and semantic relation network
CN103246640B (en) * 2013-04-23 2016-08-03 北京酷云互动科技有限公司 A kind of method and device detecting repeated text
CN103678287B (en) * 2013-11-30 2016-12-07 语联网(武汉)信息技术有限公司 A kind of method that keyword is unified
CN105528349B (en) 2014-09-29 2019-02-01 华为技术有限公司 The method and apparatus that question sentence parses in knowledge base
CN105760363B (en) * 2016-02-17 2019-12-13 腾讯科技(深圳)有限公司 Word sense disambiguation method and device for text file
CN105930435B (en) * 2016-04-19 2019-02-12 北京深度时代科技有限公司 A kind of object identifying method based on portrait model
CN106372122B (en) * 2016-08-23 2018-04-10 温州大学瓯江学院 A kind of Document Classification Method and system based on Wiki semantic matches
CN107832306A (en) * 2017-11-28 2018-03-23 武汉大学 A kind of similar entities method for digging based on Doc2vec

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5873056A (en) * 1993-10-12 1999-02-16 The Syracuse University Natural language processing system for semantic vector representation which accounts for lexical ambiguity
WO2001042984A1 (en) * 1999-12-08 2001-06-14 Roitblat Herbert L Process and system for retrieval of documents using context-relevant semantic profiles
CN1403957A (en) * 2001-09-06 2003-03-19 联想(北京)有限公司 Theme word correction method of text similarity calculation based on vector space model
CN1741012A (en) * 2004-08-23 2006-03-01 富士施乐株式会社 Test search apparatus and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5873056A (en) * 1993-10-12 1999-02-16 The Syracuse University Natural language processing system for semantic vector representation which accounts for lexical ambiguity
WO2001042984A1 (en) * 1999-12-08 2001-06-14 Roitblat Herbert L Process and system for retrieval of documents using context-relevant semantic profiles
CN1403957A (en) * 2001-09-06 2003-03-19 联想(北京)有限公司 Theme word correction method of text similarity calculation based on vector space model
CN1741012A (en) * 2004-08-23 2006-03-01 富士施乐株式会社 Test search apparatus and method

Also Published As

Publication number Publication date
CN101079025A (en) 2007-11-28

Similar Documents

Publication Publication Date Title
CN101079025B (en) File correlation computing system and method
CN101079024B (en) Special word list dynamic generation system and method
CN112732934B (en) Power grid equipment word segmentation dictionary and fault case library construction method
CN109145260B (en) Automatic text information extraction method
CN109829159A (en) A kind of integrated automatic morphology analysis methods and system of archaic Chinese text
CN109684642B (en) Abstract extraction method combining page parsing rule and NLP text vectorization
WO2017080090A1 (en) Extraction and comparison method for text of webpage
CN101079031A (en) Web page subject extraction system and method
CN103049435A (en) Text fine granularity sentiment analysis method and text fine granularity sentiment analysis device
CN107168956B (en) Chinese chapter structure analysis method and system based on pipeline
CN111061882A (en) Knowledge graph construction method
CN113221559B (en) Method and system for extracting Chinese key phrase in scientific and technological innovation field by utilizing semantic features
CN106651696A (en) Approximate question push method and system
CN111897917B (en) Rail transit industry term extraction method based on multi-modal natural language features
CN112417854A (en) Chinese document abstraction type abstract method
CN113312922B (en) Improved chapter-level triple information extraction method
CN114495143A (en) Text object identification method and device, electronic equipment and storage medium
CN111858933A (en) Character-based hierarchical text emotion analysis method and system
Shanmugalingam et al. Language identification at word level in Sinhala-English code-mixed social media text
CN105574066A (en) Web page text extraction and comparison method and system thereof
CN114239579A (en) Electric power searchable document extraction method and device based on regular expression and CRF model
Ye et al. Syntactic word embedding based on dependency syntax and polysemous analysis
CN114064901B (en) Book comment text classification method based on knowledge graph word meaning disambiguation
CN115618883A (en) Business semantic recognition method and device
CN115759037A (en) Intelligent auditing frame and auditing method for building construction scheme

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131021

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20131021

Address after: 518057 Tencent Building, 16, Nanshan District hi tech park, Guangdong, Shenzhen

Patentee after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: 518057 Guangdong city of Shenzhen province high tech Park high-tech South Road Fiyta high-tech building 5-10

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.