CN106095758B - A kind of literary works guess method of word-based vector model - Google Patents

A kind of literary works guess method of word-based vector model Download PDF

Info

Publication number
CN106095758B
CN106095758B CN201610439566.3A CN201610439566A CN106095758B CN 106095758 B CN106095758 B CN 106095758B CN 201610439566 A CN201610439566 A CN 201610439566A CN 106095758 B CN106095758 B CN 106095758B
Authority
CN
China
Prior art keywords
literary works
guess
word
corpus
works
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610439566.3A
Other languages
Chinese (zh)
Other versions
CN106095758A (en
Inventor
王庆林
李原
刘禹
阮海鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201610439566.3A priority Critical patent/CN106095758B/en
Publication of CN106095758A publication Critical patent/CN106095758A/en
Application granted granted Critical
Publication of CN106095758B publication Critical patent/CN106095758B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of literary works of word-based vector model guess methods, belong to technical field of information processing, including literary works construction of knowledge base and literary works knowledge are guessed two stages.In the building stage: collecting the small-scale corpus of specific literary works, therefrom excavate the literary works Feature Words;Term vector correlation model is obtained using the small-scale corpus training term vector neural network;Calculating and each higher related term of the Feature Words degree of association based on this model, to construct literary works guess knowledge base.In the guess stage: system randomly chooses Feature Words as guess object, the related term of the specific word is extracted from knowledge base and successively to guess person's publicity, and guess person makes inferences answer.The present invention has found the Feature Words incidence relation of specific literary works using term vector model analysis method, and investigates reader in the form of guessing and improve the interest of reading while interaction between enhancing literary works and reader to the familiarity of literary works.

Description

A kind of literary works guess method of word-based vector model
Technical field
It is the present invention relates to a kind of literary works of word-based vector model guess method, in particular to a kind of to be based on term vector The literary works guess method of text deep layer complex information relationship, belongs to the information processing technology in model automatic mining literary works Field.
Background technique
Specific literary works refer to literary works or portfolio with particular story background and plot, such Often length is longer for literary works, and relationship is complicated between personage, things.On the one hand, such literary works are read to need A large amount of energy and time is spent, in nowadays rhythm of life quick in this way, people, which are difficult to extract a large amount of time out, to be gone completely Whole works are read, one kind is thus needed quickly, absorbs literature knowledge full of interest and interactive mode;Another party Face after readers ' reading crosses certain specific literary works, can there is certain understanding to the literary works, and degree of understanding has deeply and has shallow, reading Person is merely able to qualitatively evaluate oneself familiarity to the literary works relevant knowledge, can not quantitatively evaluate, so I Need a kind of method and can quantitatively investigate reader to the familiarity of specific literary works relevant knowledge.
Knowledge guess is a kind of for reflecting guess person to Opening field or a certain restriction domain knowledge familiarity Mode, guess person's information according to one section of word or several words make inferences answer, and prompt information is less or information The degree of association is lower, then answer difficulty is bigger, and the knowledge quantity for needing the person of guess to have is also bigger.
Knowledge guess is applied in literary works, reader can not only be investigated with a variety of answering modes to the ripe of literary works Degree is known, reader can also be made to quickly understand the topological relation of the entities such as high priest in literary works, things, promote reader's Read interest.
Currently, in terms of the building of guess knowledge base, it is main by manually being constructed, need a large amount of field special Family's knowledge is cooperated.In literary works guess construction of knowledge base, the literary works of specific subject can be considered a field, Construct this field guess knowledge base, expert must have very deep understanding to the literary works, to high priest in literary works, Relationship between things clearly can just construct the guess knowledge base of high quality very much.Artificial constructed method has following a few sides Face disadvantage: guess construction of knowledge base process is very slow, and each problem requires domain expert's manual construction problem and answer, and Guessing, the general topic of knowledge base is more, and manual construction difficulty is larger;Domain-specialist knowledge is excessively relied on, it is such as ripe to the field literature It is inadequate to know degree, will be unable to building high quality guess knowledge base;For the literary works of different themes, artificial constructed method can Transplantability is poor, to the construction method that a certain theme literary works are applicable in, with the poor effect on another theme literary works.
The present invention will utilize natural language processing related tool and side for these problems present in artificial constructed method Method, automatic, science quickly and efficiently construct specific literary works guess knowledge base, and this method has stronger portable Property.After building guess knowledge base, guess knowledge base can be used to carry out answer in a manner of a variety of guesses for guess person, to quickly inhale It receives literary works relevant knowledge or qualitatively evaluates and tests oneself familiarity to the literary works relevant knowledge.
Summary of the invention
The purpose of the present invention is guess knowledge to solve how automatic, science, quickly and efficiently constructing specific literary works How library to the familiarity of specific literary works and makes reader not read over specific literature so as to quantitative assessment reader The problem of relevant knowledge of the literary works is quickly understood on the basis of works original text proposes a kind of word-based vector model Literary works guess method, this method are used to excavate its text deep layer complex information relationship simultaneously to a certain specific literary works Knowledge base is constructed, related term is extracted from knowledge base and is guessed to guess person's publicity.
Idea of the invention is that automatic mining goes out text from its relevant small-scale corpus to a certain specific literary works The information relationship of word deep layer complexity is constructed knowledge base according to a certain correlation rule, and is presented in the form of visual presentation competing The person of guessing carries out answer, so as to quickly, scientifically investigate guess person to the familiarity of this literary works, can also excavate Interest in literary works out increases interactive.
The purpose of the present invention is what is be achieved through the following technical solutions:
A kind of literary works guess method of word-based vector model, is divided into literary works construction of knowledge base and literary works Knowledge is guessed two stages, and the literary works construction of knowledge base stage includes the following steps:
Step 1, the related text corpus of specific literary works, including but not limited to literary works original work and this article are collected The related encyclopaedic knowledge entry of works and correlative study document are learned, the small-scale corpus of specific literary works is constructed;
Step 2, natural language text pretreatment work is carried out to the small-scale corpus of the literary works built, removal is not Related text noise;
Step 3, to going the small-scale corpus after noise to be named entity using natural language processing tool or method Identification, is added to obtained name entity as the distinctive Feature Words of the literary works in Feature Words vocabulary;
Step 4, whole Feature Words in Feature Words vocabulary are added in the dictionary for word segmentation of participle tool, use participle word Allusion quotation segments the small-scale corpus of specific literary works, corpus after being segmented, and by all words of corpus after participle It is no duplicate to be added in vocabulary;
Step 5, bluebeard compound vector analysis tool uses after participle corpus as input and obtains the small-scale language of the literary works The term vector model of material, and calculate and the maximally related one group of related term of each Feature Words, building literary works guess knowledge base;
The literary works knowledge guess stage includes the following steps:
Step 6, it guesses the stage into literary works knowledge, system randomly chooses a Feature Words as guess object, and The highest top n related term of the specific word degree of association is extracted from literary works guess knowledge base;
Step 7, the N number of related term retained in step 6 is divided into M group, every group has no less than 2 related terms, foundation respectively Degree of association size is that different groups set difficulty level;
Step 8, system respectively randomly selects out a related term from M group, and from low to high successively according to relational degree taxis To guess person's publicity;
Step 9, guess person makes inferences answer according to the related term of publicity, and system judges that it is answered and correctly then records public affairs Show the time, while entering next topic;It still answers wrong or does not answer when related term disappears, be then recorded as failure, while entering next Topic;
Step 10, after the problem of guess person answers certain amount, guess terminates, during system is according to guess person's answer The time of cost, accuracy carry out overall merit, and provide score, reflect that guess person is familiar with journey to the literary works with this Degree.
In the step 3 when being named Entity recognition, the name entity for representing synonymy is aligned.
In the step 5, bluebeard compound vector analysis tool, the text after using participle obtains the literature as input corpus The term vector model of the small-scale corpus of works, when calculating one group of related term maximally related with each Feature Words, with two term vectors Between the degree of association of the cosine similarity calculated result as two words.
In the step 9, guess person makes inferences answer according to the related term of publicity, and guess person is either one answers Topic form is also possible to more people and races to be the first to answer a question form.
Beneficial effect
The prior art is compared, the invention has the characteristics that:
1) literary works provided by the present invention are guessed method, by from a certain specific small-scale corpus of literary works from The dynamic information relationship for excavating text deep layer complexity can make reader quickly understand high priest in literary works, things etc. real The topological relation of body.Reader does not need completely to read whole literary works, so that it may have a comparison is deep to recognize the works Know.
2) through the invention provided by literary works guess method, can it is automatic, quickly, scientifically construct specific literature The guess knowledge base of works, this method effectively prevent the inefficiencies of manual construction method, excessively dependence domain-specialist knowledge, can The disadvantages of transplantability is poor.
3) after provided literary works guess method builds guess knowledge base through the invention, guess person can be used competing Guess that knowledge base carries out answer in a manner of a variety of guesses, system is carried out according to the time of cost, accuracy during guess person's answer Overall merit, and score is provided, quantitatively reflect guess person to the familiarity of the literary works with this.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the literary works guess method of word-based vector model of the embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in more detail.
The literary works guess method of word-based vector model of the invention, is divided into literary works construction of knowledge base and literature Works knowledge is guessed two stages.Its principle is:
In the literary works construction of knowledge base stage, the small-scale corpus of a certain specific literary works is collected first, it is secondly logical It crosses Text Pretreatment and therefrom extracts the distinctive Feature Words of the literary works, such as name, place name, time, event, followed by The small-scale corpus training term vector neural network obtains term vector correlation model, finally calculates and each feature based on this model The higher related term of word association degree, to construct literary works guess knowledge base.
Guess the stage in literary works knowledge, system randomly choose one Feature Words as object of guessing first, then from Literary works are guessed extracts the related term of the specific word in knowledge base, and successively to guess person's publicity, guess person is according to being received Related term makes inferences, until correctly answering out the specific word.
Fig. 1 is the flow diagram of the literary works guess method of word-based vector model provided by the invention.In order to more The method of the present invention process is illustrated well, is described in detail by taking literary works Heroes of the Marshes as an example.As shown in Figure 1, this method packet Include following steps:
Step 101, collect the related text corpus of specific literary works, including but not limited to literary works original work, with should The related encyclopaedic knowledge entry of literary works and correlative study document, construct the small-scale corpus of specific literary works.
Specific literary works refer to literary works or portfolio with particular story background and plot, such as " water Waterside passes ", The Romance of the Three Kingdoms, " Star War " series, " Harry Potter " series etc..Collecting specific literary works related text language During material, need to choose the corpus of text of high quality, so-called high quality corpus refers to corpus content and the literary works The original work content degree of correlation is very high and only introduces the corpus of a small amount of noise.The literary works related text corpus quality being collected into is got over Height, the term vector model that step 105 constructs are better.
In the present embodiment, in order to construct the small-scale corpus of Heroes of the Marshes, it is necessary first to collect the related text of Heroes of the Marshes Word corpus.Heroes of the Marshes original work shares 120 chapters, on this basis, by high priest and event etc. in Heroes of the Marshes literary works 427 entries are obtained the Baidu hundred of corresponding entry using web crawlers as query word automatically on Baidupedia website Section's webpage extracts corresponding entry corpus of text, has obtained plain text relevant to the influence of the Water Margin personage, historical background and literature The small-scale corpus of the Heroes of the Marshes of form, total size are 6.87M.
Step 102, natural language text pretreatment work is carried out to the small-scale corpus of the specific literary works built, Uncorrelated text noise is removed, in symbol, English character such as without practical significance, the serial number in entry and webpage Advertising information etc., this step can be further improved the quality for the literary works corpus being collected into.After denoising, Heroes of the Marshes Small-scale corpus is further compressed to 6.59M.
Step 103, use the small-scale corpus of Heroes of the Marshes as input, natural language processing tool or method are to corpus It is named Entity recognition, includes but is not limited to the name entity of name, place in identification text, obtained name entity is made It is added in Feature Words vocabulary for the distinctive Feature Words of the literary works.
In the present embodiment, Entity recognition is named using HanLP tool.In HanLP, name Entity recognition is to make It is existing for a subsequent process of participle, i.e., sentence is first subjected to cutting, then identify whether the word being syncopated as is name Entity.In the present embodiment, using the small-scale text corpus of Heroes of the Marshes as input, HanLP new word discovery function is opened, it is defeated Out for after the participle with part-of-speech tagging as a result, word and part of speech "/" are divided, " nr " such as " Song Jiang/nr ", " apartment for the newly-weds/ns " The part of speech for indicating word " Song Jiang " is name, and " ns " indicates that the part of speech of word " apartment for the newly-weds " is place name.It can be automatic by HanLP Such as " Song Jiang ", " Song Gongming ", " Lu Zhishen ", " apartment for the newly-weds " name entity are excavated, under normal circumstances, program automatic mining goes out Name entity can have a small amount of mistake, need expert to name Entity recognition result be filtered.In addition, due to different lives The meaning of name entity expression is possible to identical, therefore when being named Entity recognition, real to the name for representing synonymy Body is aligned.For example, in the present embodiment, " Song Jiang " and " Song Gongming " is the name entity for representing synonymy, is needed It is aligned, i.e., " Song Gongming " is replaced with into " Song Jiang ".The Feature Words that these name entities are constituted belong to this portion of Heroes of the Marshes The distinctive personage of literary works or object are to have pointing clearly to property and representative feature in literary works, are added to Guess object in Feature Words vocabulary, i.e., as literary works knowledge guess link.
Step 104, using participle kit, in conjunction with the Feature Words vocabulary generated in step 103, by Feature Words whole in table It is added in the dictionary for word segmentation of participle tool, and the small-scale corpus of specific literary works is segmented using dictionary for word segmentation, Corpus after being segmented, and all words of corpus after participle are added in vocabulary without duplicate.
In the present embodiment, it is segmented using HanLP participle kit, the Feature Words vocabulary that will be generated in step 103 In whole Feature Words be added in the dictionary for word segmentation of HanLP, close the new word discovery function of HanLP, it is original small for inputting Scale corpus, output are the corpus of text after segmenting;All words of corpus after participle are added to number without duplicate again According in table, Heroes of the Marshes vocabulary is constructed.
Step 103 participle is to need to carry out specially to obtain the name entity in Heroes of the Marshes literary works, i.e. Feature Words Family's filtering;Step 104 dictionary for word segmentation is that the updated dictionary of extension is different with step 103 word segmentation result, is known by name entity The Feature Words that do not excavate afterwards keep the participle effect of corpus of text in step 104 more preferable.
Step 105, bluebeard compound vector analysis tool selects suitable parameter, and corpus after participle is used to obtain as input The term vector model of the small-scale corpus of the literary works;
Term vector model can state word with vector form, similar by calculating the cosine between term vector Degree, reflects the degree of association between word, is associated between the degree of association two words of bigger explanation closer.Further calculate vocabulary In any one word and other each words cosine similarity, can excavate and the highest one group of word of the word association degree.Certainly, Those skilled in the art will be seen that removing is reflected outside the degree of association between word using cosine similarity, can also use Any one is able to reflect the method realization of the degree of association between different terms, such as Euclidean distance, manhatton distance.
In the present embodiment, it selects Word2vec as term vector analysis tool, utilizes the small rule of Heroes of the Marshes after participle Mould corpus training Word2vec neural network obtains the term vector model of 200 dimensions.By term vector model, available " the Water Margin Pass " term vector of all words in vocabulary.Further directed to each Feature Words of Heroes of the Marshes calculate its with it is other in vocabulary The degree of association of word is simultaneously ranked up, and can be obtained and the maximally related one group of related term of the specific word.Such as Feature Words " Lu Zhi It is deep ", the highest one group of word of the degree of association includes:
Rule of thumb as can be seen that the above related term has with the development of the plot of Lu Zhishen in Heroes of the Marshes story Close connection, meet people read literary works when thinking habit and the mode of thinking.
After one group of related term for successively calculating each Feature Words, i.e. the building of completion literary works guess knowledge base.
Step 106, it guesses the stage into literary works knowledge, system randomly chooses a Feature Words and is used as guess object, The specific word may be high priest, main matter, main place etc. in literary works.Further know from literary works guess Know the top n related term and its degree of association that the specific word is extracted in library.
In the present embodiment, if selecting " 100 singly eight incite somebody to action " in Heroes of the Marshes as guess object, system is taken out at random The Feature Words got be " Lu Zhishen ", then will from literary works guess knowledge base in extract " Lu Zhishen " 8 related terms and The degree of association.The setting of the N value should be not higher than the related term of any one Feature Words in all literary works guess knowledge base Number.
In practice, if guess object is personage, the word that personage is similarly in related term cannot be guess object Reference and suggesting effect well are provided, therefore a kind of method for filtering related term can be provided, similar name entity is carried out Filtering.Such as guess object be personage when, then will be similarly in related term personage vocabulary filtering, and guess object be place When, then the vocabulary that place is similarly in related term is filtered.In the present embodiment, it is extracted from literary works guess knowledge base In 8 related terms of " Lu Zhishen ", including two names: " Wu Song " and " history into ".Under this filtering rule, name will be similarly Two related terms be filtered, filtered 6 related terms include:
Step 107, the N number of related term retained in step 106 is divided into M group, every group has N/M related term, foundation respectively Degree of association size is that different groups set difficulty level.
Have much to the association ordering rule that related term is grouped, main group basis is characterized word word associated therewith Degree of association size.In the present embodiment, 6 related terms of Feature Words " Lu Zhishen " have been obtained by previous step, according to the degree of association Size sequence is equally divided into 3 groups, and every group of 2 words take highest two related terms of the degree of association as level-one difficulty group, two intermediate Word is as second level difficulty group, and two minimum words of the degree of association are as three-level difficulty group.
Step 108, system respectively randomly selects out a related term from M group, and according to relational degree taxis from low to high according to It is secondary to guess person's publicity.
In the present embodiment, system extracts related term " dandy monk ", " wineshop " and " Baozhusi " from three groups at random, and presses According to relational degree taxis from low to high first to guess person's publicity three-level difficulty group related term " Baozhusi ", and publicity two after 5 seconds Grade difficulty group related term " wineshop ", publicity level-one difficulty group related term " dandy monk " after 10 seconds, related term all disappears after 20 seconds.
Step 109, guess person makes inferences answer according to the related term of publicity, and system judges that its answer correctly then records The publicity time, while entering next topic;It still answers wrong or does not answer when related term disappears, be then recorded as failure, while under entrance One topic.
In the present embodiment, related term " Baozhusi " is to guess person's publicity, if guess person answered out correctly at the 3rd second Feature Words " Lu Zhishen ", then system records 3 seconds Reaction times, while entering next topic;If guess person still answers after 20 seconds Mistake is not answered, then system records this topic and answers failure, and enters next topic.
Further, the guess mode of guess person is either single answer form, is also possible to more people and races to be the first to answer a question form.When Guess mode is more people when racing to be the first to answer a question, can be with first correct person of racing to be the first to answer a question in Reaction time when racing to be the first to answer a question the time and answering successfully Between, it is other artificially to race to be the first to answer a question failure.
Step 110, after the problem of guess person answers certain amount, guess terminates, and system is according to guess person's answer process The time of middle cost, accuracy carry out overall merit, and provide score, reflect that guess person is familiar with journey to the literary works with this Degree.
In the present embodiment, if guess person answers 10 problems altogether, every problem has three groups of related terms, and related term is most The long display time is 35 seconds, that is, answering 10 problem maximum durations is 350 seconds.If guess person answers correct 9 problem, used time 140 altogether Second, then its score are as follows: 100 (9/10+ (350-140)/350)=150 (total score is 200 points).Score is higher, reflects guess person There is good familiarity to Heroes of the Marshes.While answer, guess person also can be carried out study, understand main in literary works It is interrelated between the Feature Words such as personage, things, the relevant knowledge more quickly, in a manner of interaction to absorb Heroes of the Marshes. Certainly, it will be understood by those skilled in the art that other point systems also can be used in standards of grading herein, but answer should be met The time spent in journey less, its higher score of accuracy just should be higher condition.It only in this way, could be to guess person to text The familiarity for learning works provides the evaluation for correctly meeting the natural law.
Above-described specific descriptions have carried out further specifically the purpose of invention, technical scheme and beneficial effects It is bright, it should be understood that the above is only a specific embodiment of the present invention, the protection model being not intended to limit the present invention It encloses, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the present invention Protection scope within.

Claims (4)

  1. The method 1. a kind of literary works of word-based vector model are guessed, which is characterized in that this method includes literary works knowledge Library building and literary works knowledge are guessed two stages, are specifically comprised the following steps:
    Step 101, the related text corpus of specific literary works, including but not limited to literary works original work and the literature are collected The related encyclopaedic knowledge entry of works and correlative study document, construct the small-scale corpus of specific literary works;
    Step 102, natural language text pretreatment work is carried out to the small-scale corpus of the literary works built, removes not phase Close text noise;
    Step 103, know to going the small-scale corpus after noise to be named entity using natural language processing tool or method Not, it is added to obtained name entity as the distinctive Feature Words of the literary works in Feature Words vocabulary;
    Step 104, whole Feature Words in Feature Words vocabulary are added in the dictionary for word segmentation of participle tool, use dictionary for word segmentation The small-scale corpus of specific literary works is segmented, corpus after being segmented, and by all words of corpus after participle without It is duplicate to be added in vocabulary;
    Step 105, bluebeard compound vector analysis tool uses after participle corpus as input and obtains the small-scale corpus of the literary works Term vector model, and calculate with the maximally related one group of related term of each Feature Words, building literary works guess knowledge base;
    Step 106, it guesses the stage into literary works knowledge, system randomly chooses a Feature Words as guess object, and from The highest top n related term of the specific word degree of association is extracted in literary works guess knowledge base;
    Step 107, the N number of related term retained in step 106 is divided into M group, every group has no less than 2 related terms, foundation respectively Degree of association size is that different groups set difficulty level;
    Step 108, system respectively randomly selects out a related term from M group, and according to relational degree taxis from low to high successively to Guess person's publicity;
    Step 109, guess person makes inferences answer according to the related term of publicity, and system judges that it is answered and correctly then records publicity Time, while entering next topic;It still answers wrong or does not answer when related term disappears, be then recorded as failure, while entering next topic;
    Step 110, after the problem of guess person answers certain amount, guess terminates, and system is according to flower during guess person's answer Time for taking, accuracy carry out overall merit, and provide score, reflect guess person to the familiarity of the literary works with this.
  2. The method 2. a kind of literary works of word-based vector model according to claim 1 are guessed, it is characterised in that: described In step 103, when being named Entity recognition, the name entity for representing synonymy is aligned.
  3. The method 3. a kind of literary works of word-based vector model according to claim 1 are guessed, it is characterised in that: described In step 105, bluebeard compound vector analysis tool uses after participle corpus as input and obtains the small-scale corpus of the literary works Term vector model, when calculating one group of related term maximally related with each Feature Words, with the cosine similarity between two term vectors The degree of association of the calculated result as two words.
  4. The method 4. a kind of literary works of word-based vector model according to claim 1 to 3 are guessed, feature exist In: in the step 109, guess person makes inferences answer according to the related term of publicity, and guess mode is either one answers Topic form is also possible to more people and races to be the first to answer a question form.
CN201610439566.3A 2016-06-17 2016-06-17 A kind of literary works guess method of word-based vector model Expired - Fee Related CN106095758B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610439566.3A CN106095758B (en) 2016-06-17 2016-06-17 A kind of literary works guess method of word-based vector model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610439566.3A CN106095758B (en) 2016-06-17 2016-06-17 A kind of literary works guess method of word-based vector model

Publications (2)

Publication Number Publication Date
CN106095758A CN106095758A (en) 2016-11-09
CN106095758B true CN106095758B (en) 2018-12-04

Family

ID=57236694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610439566.3A Expired - Fee Related CN106095758B (en) 2016-06-17 2016-06-17 A kind of literary works guess method of word-based vector model

Country Status (1)

Country Link
CN (1) CN106095758B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776562B (en) * 2016-12-20 2020-07-28 上海智臻智能网络科技股份有限公司 Keyword extraction method and extraction system
CN108694443B (en) * 2017-04-05 2021-09-17 富士通株式会社 Neural network-based language model training method and device
CN109285098A (en) * 2018-12-12 2019-01-29 广东小天才科技有限公司 A kind of study householder method and study auxiliary client, e-learning equipment
CN112953816B (en) * 2021-03-19 2022-12-30 上海掌门科技有限公司 Method, device, medium and program product for issuing guesses in friend space

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605702A (en) * 2013-11-08 2014-02-26 北京邮电大学 Word similarity based network text classification method
US8812297B2 (en) * 2010-04-09 2014-08-19 International Business Machines Corporation Method and system for interactively finding synonyms using positive and negative feedback
CN104881401A (en) * 2015-05-27 2015-09-02 大连理工大学 Patent literature clustering method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8812297B2 (en) * 2010-04-09 2014-08-19 International Business Machines Corporation Method and system for interactively finding synonyms using positive and negative feedback
CN103605702A (en) * 2013-11-08 2014-02-26 北京邮电大学 Word similarity based network text classification method
CN104881401A (en) * 2015-05-27 2015-09-02 大连理工大学 Patent literature clustering method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《An approach to sentiment analysis of short Chinese texts based on SVMs》;Lu Xing 等;《Control Conference (CCC), 2015 34th Chinese》;20150914;9115-9120 *
《基于微博的知识词条推荐算法研究》;汤斌;《中国优秀硕士学位论文全文数据库信息科技辑》;20160315;I138-7611 *

Also Published As

Publication number Publication date
CN106095758A (en) 2016-11-09

Similar Documents

Publication Publication Date Title
CN106328147B (en) Speech recognition method and device
CN106095758B (en) A kind of literary works guess method of word-based vector model
CN109815491B (en) Answer scoring method, device, computer equipment and storage medium
CN107122413A (en) A kind of keyword extracting method and device based on graph model
CN107729468A (en) Answer extracting method and system based on deep learning
US20100306248A1 (en) Document processing method and system
CN104408093A (en) News event element extracting method and device
CN109543110A (en) A kind of microblog emotional analysis method and system
CN103854063B (en) A kind of prediction of event occurrence risk method for early warning based on internet opening imformation
CN111209384A (en) Question and answer data processing method and device based on artificial intelligence and electronic equipment
CN108153732B (en) Examination method and device for interrogation notes
CN103425635A (en) Method and device for recommending answers
CN108121702A (en) Mathematics subjective item reads and appraises method and system
CN105760439A (en) Figure cooccurrence relation graph establishing method based on specific behavior cooccurrence network
CN110717324A (en) Judgment document answer information extraction method, device, extractor, medium and equipment
CN110472203B (en) Article duplicate checking and detecting method, device, equipment and storage medium
CN106547733A (en) A kind of name entity recognition method towards particular text
CN105183808A (en) Problem classification method and apparatus
CN105260385A (en) Picture retrieval method
Zhou et al. Neural storyline extraction model for storyline generation from news articles
CN107679075A (en) Method for monitoring network and equipment
CN115221864A (en) Multi-mode false news detection method and system
CN113886524A (en) Network security threat event extraction method based on short text
Lee et al. SQuARe: A large-scale dataset of sensitive questions and acceptable responses created through human-machine collaboration
CN106355455A (en) Method for extracting product feature information from online shopping user comments

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20181204

Termination date: 20190617

CF01 Termination of patent right due to non-payment of annual fee