CN110532567A - Extracting method, device, electronic equipment and the storage medium of phrase - Google Patents

Extracting method, device, electronic equipment and the storage medium of phrase Download PDF

Info

Publication number
CN110532567A
CN110532567A CN201910831629.3A CN201910831629A CN110532567A CN 110532567 A CN110532567 A CN 110532567A CN 201910831629 A CN201910831629 A CN 201910831629A CN 110532567 A CN110532567 A CN 110532567A
Authority
CN
China
Prior art keywords
phrase
word
corpus
dependence
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910831629.3A
Other languages
Chinese (zh)
Inventor
郭辰阳
钱璟
吕继根
邵英杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910831629.3A priority Critical patent/CN110532567A/en
Publication of CN110532567A publication Critical patent/CN110532567A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

This application discloses the extracting method of phrase, device, electronic equipment and storage mediums, are related to big data technical field.Specific implementation are as follows: cutting processing is carried out to corpus text, obtains short sentence;According to the dependence and part of speech between word each in the short sentence, candidate phrase is extracted;If the phrase meets preset condition, by candidate phrase deposit phrase collocation corpus.So as to according between word each in sentence dependence and part of speech be determined for compliance with the phrase collocation mode of part of speech combination, improve to the Phrase extraction efficiency and accuracy of corpus text.

Description

Extracting method, device, electronic equipment and the storage medium of phrase
Technical field
This application involves in big data technical field intelligent search technique more particularly to a kind of extracting method of phrase, Device, electronic equipment and storage medium.
Background technique
With the development of data processing technique, the function of intelligent search is also stronger and stronger.In addition to according to keyword search Except related content, user can also be had a question by input tape word phrase carry out phrase collocation result search.
Currently, phrase collocation corpus is constructed in advance generally by artificial notation methods or Co-occurrence Analysis mode, when User's input tape have a question word phrase when, search engine from phrase collocation corpus in search out it is all satisfactory short Language.
But manpower notation methods need to expend a large amount of manpower and carry out corpus labeling, and Co-occurrence Analysis mode can not be right Word in sentence beyond spacing distance carries out collocation analysis, therefore aforesaid way can not efficiently construct comprehensive phrase collocation Corpus.
Summary of the invention
The application provides extracting method, device, electronic equipment and the storage medium of a kind of phrase, can be according to each in sentence Dependence and part of speech between a word are determined for compliance with the phrase collocation mode of part of speech combination, improve the phrase to corpus text Extraction efficiency and accuracy, consequently facilitating constructing comprehensive phrase collocation corpus.
In a first aspect, the embodiment of the present application provides a kind of extracting method of phrase, comprising:
Cutting processing is carried out to corpus text, obtains short sentence;
According to the dependence and part of speech between word each in the short sentence, candidate phrase is extracted;
If the candidate phrase meets preset condition, using the candidate phrase as the deposit phrase collocation of phrase example Corpus.
In the present embodiment, part of speech can be determined for compliance with by the dependence and part of speech between word each in sentence Combined phrase collocation mode, it is comprehensive to construct so as to efficiently carry out Phrase extraction processing to a large amount of corpus texts Phrase collocation corpus provides support.
It is described that cutting processing is carried out to corpus text in a kind of possible design, obtain short sentence, comprising:
Using the punctuation mark in the corpus text as cut-off, cutting processing is carried out to the corpus text, is obtained Short sentence.
In the present embodiment, it can be met according to the punctuate in corpus text as cut-off, i.e., by comma, the sentence in text Number, branch, exclamation mark etc. cut-off of the segmentation symbol as cutting sentence, this mode be more in line with the text expression of sentence Habit can reduce the data processing amount of dependence analysis between subsequent progress word.
In a kind of possible design, the dependence and part of speech according between word each in the short sentence is mentioned Take candidate phrase, comprising:
Obtain the dependence and part of speech in the short sentence between each word;The dependence include: subject-predicate relationship, Dynamic guest's relationship, modified relationship;The part of speech includes: pronoun, noun, verb, auxiliary word, adjective, adverbial word;
It is combined according to preset part of speech, arranges in pairs or groups to the word there are dependence, obtain candidate phrase;The part of speech Combination includes: adjective modification noun, adverbial word modification adjective, adverbial word modification verb.
In the present embodiment, when extracting to phrase, while the dependence and part of speech of word in short sentence are considered, from And the accuracy of Phrase extraction can be improved.
In a kind of possible design, the dependence in the phrase between each word is obtained, comprising:
Pass through the dependence between each word in short sentence described in the syntactic analysis interface analysis in natural language processing.
In the present embodiment, it can call directly each in short sentence described in the syntactic analysis interface analysis in natural language processing Dependence between word is suitable for large batch of so as to easily and efficiently get the dependence between word Corpus text-processing, treatment effeciency are high.
In a kind of possible design, if the candidate phrase meets preset condition, the candidate phrase is stored in Phrase collocation corpus, comprising:
Judge whether the number that the candidate phrase occurs in corpus text is greater than preset threshold, if more than default threshold Value is then stored in the candidate phrase as phrase example in the phrase collocation corpus.
It, can be by judging that the candidate phrase occurs in corpus text when choosing corpus text in the present embodiment Number whether be greater than preset threshold to determine whether the candidate phrase meets preset condition, only meeting preset condition In the case of, just using candidate phrase as phrase example.The accuracy of Phrase extraction can be improved in the design method.
In a kind of possible design, judging it is pre- whether number that the candidate phrase occurs in corpus text is greater than If before threshold value, further includes:
Determine whether the phrase collocation of the candidate phrase is correct using artificial pattern verification;
If the frequency of occurrence that same phrases are arranged in pairs or groups in corpus text is n times, the accuracy of corresponding phrase collocation is big In 90%, it is determined that the preset threshold is N.
In the present embodiment, the specific value of preset threshold is not limited, can be taken according to the sentence quantity and phrase of corpus The accuracy matched is configured.It is thereby possible to select most suitable preset threshold carries out candidate phrase as decision condition Screening Treatment improves the extraction efficiency of phrase under the premise of guaranteeing accuracy.
In a kind of possible design, further includes:
Receive the retrieval phrase comprising default interrogative of user's input;
Genitive phrase example corresponding with the retrieval phrase is searched from phrase collocation corpus.
In the present embodiment, the retrieval phrase comprising default interrogative that can be inputted according to user, rapidly to user's end Feed back genitive phrase example corresponding with the retrieval phrase in end.
In a kind of possible design, institute corresponding with the retrieval phrase is being searched from phrase collocation corpus After having a phrase example, further includes:
According to the frequency that the phrase example is retrieved, the phrase example is ranked up;
The phrase example is shown according to ranking results.
In the present embodiment, the phrase example found can be ranked up, to recommend the retrieval frequency high to user Phrase example is provided and is preferably referred to for the phrase collocation of user.
Second aspect, the embodiment of the present application provide a kind of extraction element of phrase, comprising:
Cutting module obtains short sentence for carrying out cutting processing to corpus text;
Extraction module, for extracting candidate phrase according to the dependence and part of speech between word each in the short sentence;
Judgment module, for when the candidate phrase meets preset condition, using the candidate phrase as phrase reality Example deposit phrase collocation corpus.
In the present embodiment, part of speech can be determined for compliance with by the dependence and part of speech between word each in sentence Combined phrase collocation mode, it is comprehensive to construct so as to efficiently carry out Phrase extraction processing to a large amount of corpus texts Phrase collocation corpus provides support.
In a kind of possible design, the cutting module is specifically used for:
Using the punctuation mark in the corpus text as cut-off, cutting processing is carried out to the corpus text, is obtained Short sentence.
In the present embodiment, it can be met according to the punctuate in corpus text as cut-off, i.e., by comma, the sentence in text Number, branch, exclamation mark etc. cut-off of the segmentation symbol as cutting sentence, this mode be more in line with the text expression of sentence Habit can reduce the data processing amount of dependence analysis between subsequent progress word.
In a kind of possible design, the extraction module is specifically used for:
Obtain the dependence and part of speech in the short sentence between each word;The dependence include: subject-predicate relationship, Dynamic guest's relationship, modified relationship;The part of speech includes: pronoun, noun, verb, auxiliary word, adjective, adverbial word;
It is combined according to preset part of speech, arranges in pairs or groups to the word there are dependence, obtain candidate phrase;The part of speech Combination includes: adjective modification noun, adverbial word modification adjective, adverbial word modification verb.
In the present embodiment, when extracting to phrase, while the dependence and part of speech of word in short sentence are considered, from And the accuracy of Phrase extraction can be improved.
In a kind of possible design, the dependence in the phrase between each word is obtained, comprising:
Pass through the dependence between each word in short sentence described in the syntactic analysis interface analysis in natural language processing.
In the present embodiment, it can call directly each in short sentence described in the syntactic analysis interface analysis in natural language processing Dependence between word is suitable for large batch of so as to easily and efficiently get the dependence between word Corpus text-processing, treatment effeciency are high.
In a kind of possible design, the judgment module is specifically used for:
Judge whether the number that the candidate phrase occurs in corpus text is greater than preset threshold, if more than default threshold Value is then stored in the candidate phrase as phrase example in the phrase collocation corpus.
It, can be by judging that the candidate phrase occurs in corpus text when choosing corpus text in the present embodiment Number whether be greater than preset threshold to determine whether the candidate phrase meets preset condition, only meeting preset condition In the case of, just using candidate phrase as phrase example.The accuracy of Phrase extraction can be improved in the design method.
In a kind of possible design, further includes: determining module is used for:
Determine whether the phrase collocation of the candidate phrase is correct using artificial pattern verification;
If the frequency of occurrence that same phrases are arranged in pairs or groups in corpus text is n times, the accuracy of corresponding phrase collocation is big In 90%, it is determined that the preset threshold is N.
In the present embodiment, the specific value of preset threshold is not limited, can be taken according to the sentence quantity and phrase of corpus The accuracy matched is configured.It is thereby possible to select most suitable preset threshold carries out candidate phrase as decision condition Screening Treatment improves the extraction efficiency of phrase under the premise of guaranteeing accuracy.
In a kind of possible design, further includes:
Receiving module, for receiving the retrieval phrase comprising default interrogative of user's input;
Enquiry module, it is real for searching genitive phrase corresponding with the retrieval phrase from phrase collocation corpus Example.
In the present embodiment, the retrieval phrase comprising default interrogative that can be inputted according to user, rapidly to user's end Feed back genitive phrase example corresponding with the retrieval phrase in end.
In a kind of possible design, further includes: display module is used for:
According to the frequency that the phrase example is retrieved, the phrase example is ranked up;
The phrase example is shown according to ranking results.
In the present embodiment, the phrase example found can be ranked up, to recommend the retrieval frequency high to user Phrase example is provided and is preferably referred to for the phrase collocation of user.
The third aspect, the application provide a kind of electronic equipment, comprising: processor and memory;It is stored in memory State the executable instruction of processor;Wherein, the processor is configured to execute such as first via the executable instruction is executed The extracting method of phrase described in any one of aspect.
Fourth aspect, the application provide a kind of computer readable storage medium, are stored thereon with computer program, the program The extracting method of phrase described in any one of first aspect is realized when being executed by processor.
5th aspect, the embodiment of the present application provide a kind of program product, and described program product includes: computer program, institute It states computer program to be stored in readable storage medium storing program for executing, at least one processor of server can be from the readable storage medium storing program for executing The computer program is read, at least one described processor executes the computer program and server is made to execute first aspect In any phrase extracting method.
6th aspect, the embodiment of the present application also provide a kind of extracting method of phrase, comprising:
Cutting processing is carried out to corpus text, obtains short sentence;
According to the dependence and part of speech between word each in the short sentence, candidate phrase is extracted.
In the present embodiment, part of speech can be determined for compliance with by the dependence and part of speech between word each in sentence Combined phrase collocation mode, so as to efficiently carry out Phrase extraction processing to a large amount of corpus texts.
One embodiment in above-mentioned application has the following advantages that or the utility model has the advantages that can carry out at cutting to corpus text Reason, obtains short sentence;According to the dependence and part of speech between word each in the short sentence, candidate phrase is extracted;If described short Language meets preset condition, then by the technological means of candidate phrase deposit phrase collocation corpus, so overcoming existing short The low technical problem of language extraction efficiency, and then reach to improve and the Phrase extraction efficiency of corpus text and the technology of accuracy are imitated Fruit.
Other effects possessed by above-mentioned optional way are illustrated hereinafter in conjunction with specific embodiment.
Detailed description of the invention
Attached drawing does not constitute the restriction to the application for more fully understanding this programme.Wherein:
Fig. 1 can be achieved on the scene figure of the extraction of the phrase of the embodiment of the present application;
Fig. 2 is the schematic diagram according to the application first embodiment;
Fig. 3 is the schematic illustration according to the application syntactic analysis;
Fig. 4 is the schematic diagram according to the application second embodiment;
Fig. 5 is the schematic diagram according to the application 3rd embodiment;
Fig. 6 is the schematic diagram according to the application fourth embodiment;
Fig. 7 is the schematic diagram according to the 5th embodiment of the application;
Fig. 8 is the block diagram for the electronic equipment for the extracting method for realizing the phrase of the embodiment of the present application.
Specific embodiment
It explains below in conjunction with exemplary embodiment of the attached drawing to the application, including the various of the embodiment of the present application Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from the scope and spirit of the present application.Together Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
It explains below in conjunction with exemplary embodiment of the attached drawing to the application, including the various of the embodiment of the present application Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from the scope and spirit of the present application.Together Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
The description and claims of this application and term " first ", " second ", " third ", " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that embodiments herein described herein for example can be to remove Sequence other than those of illustrating or describe herein is implemented.In addition, term " includes " and " having " and theirs is any Deformation, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, production Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for this A little process, methods, the other step or units of product or equipment inherently.
It is described in detail below with technical solution of the specifically embodiment to the application.These specific implementations below Example can be combined with each other, and the same or similar concept or process may be repeated no more in some embodiments.
With the development of data processing technique, the function of intelligent search is also stronger and stronger.In addition to according to keyword search Except related content, user can also be had a question by input tape word phrase carry out phrase collocation result search.Currently, one As be constructed in advance by artificial notation methods or Co-occurrence Analysis mode phrase collocation corpus, when user input with doubt When asking the phrase of word, search engine searches out all satisfactory phrases from phrase collocation corpus.But manpower marks Mode needs to expend a large amount of manpower and carries out corpus labeling, and Co-occurrence Analysis mode can not be to the word for exceeding spacing distance in sentence Language carries out collocation analysis, therefore aforesaid way can not efficiently construct comprehensive phrase collocation corpus.
In view of the above technical problems, extracting method, device, electronic equipment and the storage that the application provides a kind of phrase are situated between Matter, can according between word each in sentence dependence and part of speech be determined for compliance with part of speech combination phrase collocation mode, The Phrase extraction efficiency and accuracy to corpus text are improved, consequently facilitating constructing comprehensive phrase collocation corpus.
Fig. 1 can be achieved on the scene figure of the extraction of the phrase of the embodiment of the present application, as shown in Figure 1, Phrase extraction device It may include: cutting module, extraction module, judgment module.The profession that phrase occurs in different corpus texts is biased to different.Cause This, can in order to construct professional domain phrase arrange in pairs or groups corpus, select the data of the professional domain as corpus text Source, for example, by using the technology books, periodical, paper etc. in the field.It may be guarantee corpus multiplicity as far as possible and expressed profession, Select general data as the source of corpus text, for example, by using People's Daily's data etc..For corpus text, first with cutting Sub-module naturally makes pauses in reading unpunctuated ancient writings method according to Chinese language that is, will be funny in text using the punctuation mark in corpus text as cut-off Number, fullstop, branch, exclamation mark etc. cut-off of the segmentation symbol as cutting sentence, cutting processing is carried out to corpus text, is obtained To short sentence.This mode is more in line with the text expression habit of sentence, can reduce dependence point between subsequent progress word The data processing amount of analysis.Then, extraction module is main foundation according to grammar property and takes into account lexical meaning, is obtained each in short sentence Then dependence and part of speech between a word are combined according to preset part of speech, are taken to the word there are dependence Match, obtains candidate phrase.During specific implementation, the syntactic analysis interface analysis short sentence in natural language processing can be passed through In dependence between each word be suitable for big so as to easily and efficiently get the dependence between word The corpus text-processing of batch, treatment effeciency are high.Since the accuracy rate of syntactic analysis interface is not 100%, the analysis of simple sentence can It can there is a situation where mistake, it is therefore desirable to candidate phrase be confirmed by judgment module, and met in candidate phrase preset When condition, using candidate phrase as phrase example deposit phrase collocation corpus.It, can be according to identical during specific implementation Whether frequency of occurrence of arranging in pairs or groups is more than threshold value to determine whether the phrase can be used.If more than preset threshold, then using candidate phrase as Phrase example is stored in phrase collocation corpus.It should be noted that the present embodiment does not limit the specific value of preset threshold, this The size of threshold value can be rationally arranged in field technical staff according to the actual situation.Meanwhile the present embodiment does not also limit preset threshold Method of determination, can using be manually set, different threshold values different type can also be arranged by algorithm.Optionally, In Before judging whether number that candidate phrase occurs in corpus text is greater than preset threshold, further includes: using artificial verification side Formula determines whether the phrase collocation of candidate phrase is correct;If the frequency of occurrence that same phrases are arranged in pairs or groups in corpus text is n times, The accuracy of corresponding phrase collocation is greater than 90%, it is determined that preset threshold N.
Using the above method can according between word each in sentence dependence and part of speech be determined for compliance with part of speech group The phrase collocation mode of conjunction improves Phrase extraction efficiency and accuracy to corpus text, consequently facilitating constructing comprehensive phrase Collocation corpus.
Fig. 2 is according to the schematic diagram of the application first embodiment, as shown in Fig. 2, the method in the present embodiment may include:
S101, cutting processing is carried out to corpus text, obtains short sentence.
In the present embodiment, can naturally make pauses in reading unpunctuated ancient writings method according to Chinese language, using the punctuation mark in corpus text as cutting Point, i.e., using in text comma, fullstop, branch, exclamation mark etc. segmentation symbol as the cut-off of cutting sentence, to corpus text This progress cutting processing, obtains short sentence.This mode is more in line with the text expression habit of sentence, can reduce subsequent carry out word The data processing amount that dependence is analyzed between language.
It should be noted that the source of the present embodiment not qualifier material text, phrase occurs in different corpus texts Profession is biased to different.Can in order to construct professional domain phrase arrange in pairs or groups corpus, select the data of the professional domain as language The source for expecting text, for example, by using the technology books, periodical, paper etc. in the field.It may be that guarantee corpus is various as far as possible Property and expression it is professional, select general data as the source of corpus text, for example, by using People's Daily's data etc..
S102, according to the dependence and part of speech between word each in short sentence, extract candidate phrase.
In the present embodiment, dependence and part of speech in available short sentence between each word, then according to preset Part of speech combination, arranges in pairs or groups to the word there are dependence, obtains candidate phrase.Wherein, part of speech is to be according to grammar property Main foundation takes into account the result that lexical meaning divides word, comprising: pronoun, noun, verb, auxiliary word, adjective, adverbial word, Number, preposition etc.;Dependence be mutually dominate between sentence element with is dominated, it is interdependent and by interdependent relationship, comprising: subject-predicate Relationship, dynamic guest's relationship, modified relationship, relationship in shape, it is fixed in relationship etc.;Part of speech combination refers in object phrase collocation corpus Dependence between sentence element, comprising: adjective modification noun, adverbial word modification adjective, adverbial word modification verb etc..To word Between dependence and part of speech acquisition, can be obtained, can also be realized by algorithm by the way of manually marking.This reality It applies in example, when extracting to phrase, while considering the dependence and part of speech of word in short sentence, it is short so as to improve The accuracy that language extracts.
It is alternatively possible to by between each word in the syntactic analysis interface analysis short sentence in natural language processing according to The relationship of relying is suitable for large batch of corpus text-processing so as to easily and efficiently get the dependence between word, Treatment effeciency is high.
Specifically, it calls the syntactic analysis interface in natural language processing to carry out syntactic analysis to an above-mentioned result of cutting, obtains Take the part of speech and dependence of each ingredient in sentence.Fig. 3 is according to the schematic illustration of the application syntactic analysis, such as Fig. 3 institute Show, for " white clouds drift in the clear and bright sky." this short sentence (being indicated with ROOT), syntactic analysis interface is available each Part of speech between word, it be preposition (being indicated with p), " bright and clean " is shape that obtain " white clouds ", which be inherent noun (being indicated with nr), " ", Hold word (being indicated with a), " " be auxiliary word (being indicated with u), " sky " be noun (being indicated with n), " in " be adverbial word (being indicated with f), " drifting " be verb (being indicated with v), "." it is punctuation mark (being indicated with w).Syntactic analysis interface is also between available word Dependence obtains " white clouds drift " composition subject-predicate relationship (being indicated with SBV), " drifting " constitutes relationship in shape and (use ADV table Show), " it is aerial " constitute relationship (being indicated with ATT) in fixed, " bright and clean " composition " " word structure (being indicated with DE), " sky " Constitute " " word structure (being indicated with DE).By syntactic analysis interface, available " white clouds drift " is that the core of the sentence (is used HED is indicated).Then, according to the common collocation form of object phrase, " azure sky " as common in polarization phrase (is described Word modification noun), " in the extreme outstanding " (adverbial word modification adjective), " stammering out " (adverbial word modification verb) etc., can To be determined according to the DE relationship combination part of speech in figure;Dynamic guest's phrase (verb collocation noun) can be according to dependence with before Part of speech afterwards is determined.The present embodiment, can be by syntactic analysis as a result, accurate obtain each ingredient in sentence Then part of speech and dependence obtain phrase example according to the common part of speech collocation of object phrase, can greatly save mark Human cost, while also solving the problems, such as in simple Co-occurrence Analysis method collocation apart from too long and fail.
If S103, candidate phrase meet preset condition, phrase associated language is stored in using candidate phrase as phrase example Expect library.
In the present embodiment, it can be determined that whether the number that candidate phrase occurs in corpus text is greater than preset threshold, if Greater than preset threshold, then using candidate phrase as in phrase example deposit phrase collocation corpus.
Specifically, the accuracy rate of syntactic analysis interface is not 100%, so there may be the feelings of mistake for the analysis of simple sentence Condition needs to determine whether the phrase can be used according to whether identical collocation frequency of occurrence is more than threshold value.It should be noted that this reality The specific value that example does not limit preset threshold is applied, the big of threshold value can be rationally arranged in those skilled in the art according to the actual situation It is small.Meanwhile the present embodiment does not limit the method for determination of preset threshold yet, can also pass through algorithm pair using being manually set Different threshold values is arranged in different type.
Optionally, it before judging whether number that candidate phrase occurs in corpus text is greater than preset threshold, also wraps It includes: determining whether the phrase collocation of candidate phrase is correct using artificial pattern verification;If same phrases are arranged in pairs or groups in corpus text Frequency of occurrence when being n times, the accuracy of corresponding phrase collocation is greater than 90%, it is determined that preset threshold N.
It should be noted that the specific value of preset threshold is not limited in the present embodiment, it can be according to the sentence of corpus The accuracy of quantity and phrase collocation is configured.
The present embodiment obtains short sentence by carrying out cutting processing to corpus text;According between word each in short sentence Dependence and part of speech extract candidate phrase;If phrase meets preset condition, by candidate phrase deposit phrase collocation corpus Library.So as to according between word each in sentence dependence and part of speech be determined for compliance with part of speech combination phrase collocation side Formula improves Phrase extraction efficiency and accuracy to corpus text, consequently facilitating constructing comprehensive phrase collocation corpus.
Fig. 4 is the schematic diagram according to the application second embodiment;As shown in figure 4, the method in the present embodiment may include:
S201, cutting processing is carried out to corpus text, obtains short sentence.
S202, according to the dependence and part of speech between word each in short sentence, extract candidate phrase.
If S203, candidate phrase meet preset condition, phrase associated language is stored in using candidate phrase as phrase example Expect library.
Step S201~step S203 specific implementation process and realization principle method shown in Figure 2 in the present embodiment In associated description, details are not described herein again.
S204, the retrieval phrase comprising default interrogative for receiving user's input.
In the present embodiment, the retrieval phrase comprising default interrogative of user's input can also be received, to meet word Class of filling a vacancy Search Requirement.The retrieval phrase of input, such as " beautiful what ", " what sky ", " what is cleared up " etc..
S205, genitive phrase example corresponding with retrieval phrase is searched from phrase collocation corpus.
In the present embodiment, corresponding phrase can be searched from phrase collocation corpus according to the keyword of retrieval phrase Example.For example, retrieving in phrase collocation corpus when " beautiful what " retrieves phrase is, phrase example such as beauty is returned Beautiful (scenery), beautiful (local), beautiful (the Nature), beautiful (campus), beautiful (grassland), beautiful (butterfly Butterfly knot), beautiful (landscape painting), beautiful (landscape), beautiful (spring), beautiful (soul) etc..
Optionally, it after searching genitive phrase example corresponding with retrieval phrase in phrase collocation corpus, also wraps Include: the frequency being retrieved according to phrase example is ranked up phrase example;Phrase example is shown according to ranking results.
Specifically, the frequency that phrase example is retrieved and selects reflects the accuracy and general degree of phrase example.Cause This, from high to low can be ranked up phrase example according to the frequency that phrase example is retrieved and selects, by the short of high frequency time Language example comes front, to facilitate user preferentially to select.
The present embodiment obtains short sentence by carrying out cutting processing to corpus text;According between word each in short sentence Dependence and part of speech extract candidate phrase;If phrase meets preset condition, by candidate phrase deposit phrase collocation corpus Library.So as to according between word each in sentence dependence and part of speech be determined for compliance with part of speech combination phrase collocation side Formula improves Phrase extraction efficiency and accuracy to corpus text, consequently facilitating constructing comprehensive phrase collocation corpus.
In addition, the present embodiment, can also receive the retrieval phrase comprising default interrogative of user's input.Then, from short Language, which is arranged in pairs or groups, to be searched in corpus and retrieves the corresponding genitive phrase example of phrase, is filled a vacancy class Search Requirement to meet word.
Fig. 5 is the schematic diagram according to the application 3rd embodiment;As shown in figure 5, the method in the present embodiment may include:
S301, cutting processing is carried out to corpus text, obtains short sentence.
S302, according to the dependence and part of speech between word each in short sentence, extract candidate phrase.
Step S301~step S302 specific implementation process and realization principle method shown in Figure 2 in the present embodiment In associated description, details are not described herein again.
The present embodiment obtains short sentence by carrying out cutting processing to corpus text;According between word each in short sentence Dependence and part of speech extract candidate phrase.So as to according between word each in sentence dependence and part of speech it is true Surely meet the phrase collocation mode of part of speech combination, improve Phrase extraction efficiency and accuracy to corpus text.
Using the method in above-mentioned embodiment illustrated in fig. 5, the rapidly extracting to phrase may be implemented.Such as the corpus of input Text are as follows: " white clouds drift in the clear and bright sky ", the then phrase extracted are " the clear and bright sky ".When user is when retrieving phrase Input " what sky ", then search engine can phrase arrange in pairs or groups search in corpus it is all triliteral short Language, such as " the clear and bright sky " this phrase is fed back into user terminal.
Fig. 6 is the schematic diagram according to the application fourth embodiment;As shown in fig. 6, the device in the present embodiment may include:
Cutting module 31 obtains short sentence for carrying out cutting processing to corpus text;
Extraction module 32, for extracting candidate phrase according to the dependence and part of speech between word each in short sentence;
Judgment module 33, for being stored in candidate phrase as phrase example when candidate phrase meets preset condition Phrase collocation corpus.
In the present embodiment, part of speech can be determined for compliance with by the dependence and part of speech between word each in sentence Combined phrase collocation mode, it is comprehensive to construct so as to efficiently carry out Phrase extraction processing to a large amount of corpus texts Phrase collocation corpus provides support.
In a kind of possible design, cutting module 31 is specifically used for:
Using the punctuation mark in corpus text as cut-off, cutting processing is carried out to corpus text, obtains short sentence.
In the present embodiment, it can be met according to the punctuate in corpus text as cut-off, i.e., by comma, the sentence in text Number, branch, exclamation mark etc. cut-off of the segmentation symbol as cutting sentence, this mode be more in line with the text expression of sentence Habit can reduce the data processing amount of dependence analysis between subsequent progress word.
In a kind of possible design, extraction module 32 is specifically used for:
Obtain the dependence and part of speech in short sentence between each word;Dependence includes: subject-predicate relationship, dynamic guest pass System, modified relationship;Part of speech includes: pronoun, noun, verb, auxiliary word, adjective, adverbial word;
It is combined according to preset part of speech, arranges in pairs or groups to the word there are dependence, obtain candidate phrase;Part of speech combination It include: adjective modification noun, adverbial word modification adjective, adverbial word modification verb.
In the present embodiment, when extracting to phrase, while the dependence and part of speech of word in short sentence are considered, from And the accuracy of Phrase extraction can be improved.
In a kind of possible design, the dependence in phrase between each word is obtained, comprising:
Pass through the dependence between each word in the syntactic analysis interface analysis short sentence in natural language processing.
In the present embodiment, each word in the syntactic analysis interface analysis short sentence in natural language processing can be called directly Between dependence, so as to easily and efficiently get the dependence between word, be suitable for large batch of corpus Text-processing, treatment effeciency are high.
In a kind of possible design, judgment module 33 is specifically used for:
Judge whether the number that candidate phrase occurs in corpus text is greater than preset threshold, if more than preset threshold, then Using candidate phrase as in phrase example deposit phrase collocation corpus.
It, can be by judging that candidate phrase occur secondary in corpus text when choosing corpus text in the present embodiment Whether number, which is greater than preset threshold, determines whether candidate phrase meets preset condition, only in the case where meeting preset condition, Just using candidate phrase as phrase example.The accuracy of Phrase extraction can be improved in the design method.
The extraction element of the phrase of the present embodiment can execute the technical solution in method shown in Fig. 2, Fig. 5, specific real The existing associated description of process and technical principle referring to fig. 2, in method shown in Fig. 5, details are not described herein again.
The present embodiment obtains short sentence by carrying out cutting processing to corpus text;According between word each in short sentence Dependence and part of speech extract candidate phrase;If phrase meets preset condition, by candidate phrase deposit phrase collocation corpus Library.So as to according between word each in sentence dependence and part of speech be determined for compliance with part of speech combination phrase collocation side Formula improves Phrase extraction efficiency and accuracy to corpus text, consequently facilitating constructing comprehensive phrase collocation corpus.
Fig. 7 is the schematic diagram according to the 5th embodiment of the application;As shown in fig. 7, the device in the present embodiment is shown in Fig. 6 On the basis of device, can also include:
Determining module 34, is used for:
Determine whether the phrase collocation of candidate phrase is correct using artificial pattern verification;
If the frequency of occurrence that same phrases are arranged in pairs or groups in corpus text is n times, the accuracy of corresponding phrase collocation is big In 90%, it is determined that preset threshold N.
In the present embodiment, the specific value of preset threshold is not limited, can be taken according to the sentence quantity and phrase of corpus The accuracy matched is configured.It is thereby possible to select most suitable preset threshold carries out candidate phrase as decision condition Screening Treatment improves the extraction efficiency of phrase under the premise of guaranteeing accuracy.
In a kind of possible design, further includes:
Receiving module 35, for receiving the retrieval phrase comprising default interrogative of user's input;
Enquiry module 36, for searching genitive phrase example corresponding with retrieval phrase from phrase collocation corpus.
In the present embodiment, the retrieval phrase comprising default interrogative that can be inputted according to user, rapidly to user's end End feedback genitive phrase example corresponding with retrieval phrase.
In a kind of possible design, further includes: display module 37 is used for:
According to the frequency that phrase example is retrieved, phrase example is ranked up;
Phrase example is shown according to ranking results.
In the present embodiment, the phrase example found can be ranked up, to recommend the retrieval frequency high to user Phrase example is provided and is preferably referred to for the phrase collocation of user.
The extraction element of the phrase of the present embodiment can execute the technical solution in method shown in Fig. 2, Fig. 4, Fig. 5, tool Body realizes the associated description of process and technical principle referring to fig. 2, in method shown in Fig. 4, Fig. 5, and details are not described herein again.
The present embodiment obtains short sentence by carrying out cutting processing to corpus text;According between word each in short sentence Dependence and part of speech extract candidate phrase;If phrase meets preset condition, by candidate phrase deposit phrase collocation corpus Library.So as to according between word each in sentence dependence and part of speech be determined for compliance with part of speech combination phrase collocation side Formula improves Phrase extraction efficiency and accuracy to corpus text, consequently facilitating constructing comprehensive phrase collocation corpus.
In addition, the present embodiment can also receive the retrieval phrase comprising default interrogative of user's input.Then, from phrase It arranges in pairs or groups and searches in corpus and retrieve the corresponding genitive phrase example of phrase, fill a vacancy class Search Requirement to meet word.
According to an embodiment of the present application, present invention also provides a kind of electronic equipment and a kind of readable storage medium storing program for executing.
Fig. 8 is the block diagram for the electronic equipment for the extracting method for realizing the phrase of the embodiment of the present application;As shown in figure 8, It is the block diagram according to the electronic equipment of the extracting method of the phrase of the embodiment of the present application.Electronic equipment is intended to indicate that various forms of Digital computer, such as, laptop computer, desktop computer, workbench, personal digital assistant, server, blade type service Device, mainframe computer and other suitable computer.Electronic equipment also may indicate that various forms of mobile devices, such as, a Number word processing, cellular phone, smart phone, wearable device and other similar computing devices.Component shown in this article, it Connection and relationship and their function it is merely exemplary, and be not intended to limit described herein and/or want The realization of the application asked.
As shown in figure 8, the electronic equipment includes: one or more processors 501, memory 502, and each for connecting The interface of component, including high-speed interface and low-speed interface.All parts are interconnected using different buses, and can be pacified It installs in other ways on public mainboard or as needed.Processor can to the instruction executed in electronic equipment into Row processing, including storage in memory or on memory (such as, to be coupled to interface in external input/output device Display equipment) on show GUI graphical information instruction.In other embodiments, if desired, can be by multiple processors And/or multiple bus is used together with multiple memories with multiple memories.It is also possible to multiple electronic equipments are connected, it is each Equipment provides the necessary operation in part (for example, as server array, one group of blade server or multiprocessor system System).In Fig. 8 by taking a processor 501 as an example.
Memory 502 is non-transitory computer-readable storage medium provided herein.Wherein, memory is stored with The instruction that can be executed by least one processor, so that at least one processor executes the extraction side of phrase provided herein Method.The non-transitory computer-readable storage medium of the application stores computer instruction, and the computer instruction is for holding computer The extracting method of row phrase provided herein.
Memory 502 is used as a kind of non-transitory computer-readable storage medium, can be used for storing non-instantaneous software program, non- Instantaneous computer executable program and module, as the corresponding program instruction of the extracting method of the phrase in the embodiment of the present application/ Module.Non-instantaneous software program, instruction and the module that processor 501 is stored in memory 502 by operation, thereby executing The various function application and data processing of server, the i.e. extracting method of phrase in realization above method embodiment.
Memory 502 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function;Storage data area can store the electronic equipment of the extracting method according to phrase Use created data etc..In addition, memory 502 may include high-speed random access memory, it can also include non-wink When memory, a for example, at least disk memory, flush memory device or other non-instantaneous solid-state memories.In some realities It applies in example, optional memory 502 includes the memory remotely located relative to processor 501, these remote memories can lead to Cross the electronic equipment being connected to the network to the extracting method of phrase.The example of above-mentioned network includes but is not limited to internet, in enterprise Portion's net, local area network, mobile radio communication and combinations thereof.
The electronic equipment of the extracting method of phrase can also include: input unit 503 and output device 504.Processor 501, memory 502, input unit 503 and output device 504 can be connected by bus or other modes, with logical in Fig. 8 It crosses for bus connection.
Input unit 503 can receive the number or character information of input, and generate the electronics with the extracting method of phrase The related key signals input of the user setting and function control of equipment, such as touch screen, keypad, mouse, track pad, touch The input units such as plate, indicating arm, one or more mouse button, trace ball, control stick.Output device 504 may include showing Show equipment, auxiliary lighting apparatus (for example, LED) and haptic feedback devices (for example, vibrating motor) etc..The display equipment can wrap It includes but is not limited to, liquid crystal display (LCD), light emitting diode (LED) display and plasma scope.In some embodiment party In formula, display equipment can be touch screen.
The various embodiments of system and technology described herein can be in digital electronic circuitry, integrated circuit system It is realized in system, dedicated ASIC (specific integrated circuit), computer hardware, firmware, software, and/or their combination.These are various Embodiment may include: to implement in one or more computer program, which can be It executes and/or explains in programmable system containing at least one programmable processor, which can be dedicated Or general purpose programmable processors, number can be received from storage system, at least one input unit and at least one output device According to and instruction, and data and instruction is transmitted to the storage system, at least one input unit and this at least one output Device.
These calculation procedures (also referred to as program, software, software application or code) include the machine of programmable processor Instruction, and can use programming language, and/or the compilation/machine language of level process and/or object-oriented to implement these Calculation procedure.As used herein, term " machine readable media " and " computer-readable medium " are referred to for referring to machine It enables and/or data is supplied to any computer program product, equipment, and/or the device of programmable processor (for example, disk, light Disk, memory, programmable logic device (PLD)), including, receive the machine readable of the machine instruction as machine-readable signal Medium.Term " machine-readable signal " is referred to for machine instruction and/or data to be supplied to any of programmable processor Signal.
In order to provide the interaction with user, system and technology described herein, the computer can be implemented on computers The display device for showing information to user is included (for example, CRT (cathode-ray tube) or LCD (liquid crystal display) monitoring Device);And keyboard and indicator device (for example, mouse or trace ball), user can by the keyboard and the indicator device come Provide input to computer.The device of other types can be also used for providing the interaction with user;For example, being supplied to user's Feedback may be any type of sensory feedback (for example, visual feedback, audio feedback or touch feedback);And it can use Any form (including vocal input, voice input or tactile input) receives input from the user.
System described herein and technology can be implemented including the computing system of background component (for example, as data Server) or the computing system (for example, application server) including middleware component or the calculating including front end component System is (for example, the subscriber computer with graphic user interface or web browser, user can pass through graphical user circle Face or the web browser to interact with the embodiment of system described herein and technology) or including this backstage portion In any combination of computing system of part, middleware component or front end component.Any form or the number of medium can be passed through Digital data communicates (for example, communication network) and is connected with each other the component of system.The example of communication network includes: local area network (LAN), wide area network (WAN) and internet.
Computer system may include client and server.Client and server is generally off-site from each other and usually logical Communication network is crossed to interact.By being run on corresponding computer and each other with the meter of client-server relation Calculation machine program generates the relationship of client and server.
It should be understood that various forms of processes illustrated above can be used, rearrangement increases or deletes step.Example Such as, each step recorded in the application of this hair can be performed in parallel or be sequentially performed the order that can also be different and execute, As long as it is desired as a result, being not limited herein to can be realized technical solution disclosed in the present application.
Above-mentioned specific embodiment does not constitute the limitation to the application protection scope.Those skilled in the art should be bright White, according to design requirement and other factors, various modifications can be carried out, combination, sub-portfolio and substitution.It is any in the application Spirit and principle within made modifications, equivalent substitutions and improvements etc., should be included within the application protection scope.

Claims (12)

1. a kind of extracting method of phrase characterized by comprising
Cutting processing is carried out to corpus text, obtains short sentence;
According to the dependence and part of speech between word each in the short sentence, candidate phrase is extracted;
If the candidate phrase meets preset condition, using the candidate phrase as phrase example deposit phrase collocation corpus Library.
2. short sentence is obtained the method according to claim 1, wherein described carry out cutting processing to corpus text, Include:
Using the punctuation mark in the corpus text as cut-off, cutting processing is carried out to the corpus text, obtains short sentence.
3. the method according to claim 1, wherein the dependence according between word each in the short sentence Relationship and part of speech extract candidate phrase, comprising:
Obtain the dependence and part of speech in the short sentence between each word;The dependence includes: subject-predicate relationship, dynamic guest Relationship, modified relationship;The part of speech includes: pronoun, noun, verb, auxiliary word, adjective, adverbial word;
It is combined according to preset part of speech, arranges in pairs or groups to the word there are dependence, obtain candidate phrase;The part of speech combination It include: adjective modification noun, adverbial word modification adjective, adverbial word modification verb.
4. according to the method described in claim 3, it is characterized in that, the dependence obtained in the phrase between each word is closed System, comprising:
Pass through the dependence between each word in short sentence described in the syntactic analysis interface analysis in natural language processing.
5. the method according to claim 1, wherein if the candidate phrase meets preset condition, by institute State candidate phrase deposit phrase collocation corpus, comprising:
Judge whether the number that the candidate phrase occurs in corpus text is greater than preset threshold, if more than preset threshold, then It is stored in the candidate phrase as phrase example in the phrase collocation corpus.
6. according to the method described in claim 5, it is characterized in that, judging what the candidate phrase occurred in corpus text Whether number is greater than before preset threshold, further includes:
Determine whether the phrase collocation of the candidate phrase is correct using artificial pattern verification;
If the frequency of occurrence that same phrases are arranged in pairs or groups in corpus text is n times, the accuracy of corresponding phrase collocation is greater than 90%, it is determined that the preset threshold is N.
7. method according to claim 1 to 6, which is characterized in that further include:
Receive the retrieval phrase comprising default interrogative of user's input;
Genitive phrase example corresponding with the retrieval phrase is searched from phrase collocation corpus.
8. the method according to the description of claim 7 is characterized in that being searched and the inspection from phrase collocation corpus After the corresponding genitive phrase example of rope phrase, further includes:
According to the frequency that the phrase example is retrieved, the phrase example is ranked up;
The phrase example is shown according to ranking results.
9. a kind of extraction element of phrase characterized by comprising
Cutting module obtains short sentence for carrying out cutting processing to corpus text;
Extraction module, for extracting candidate phrase according to the dependence and part of speech between word each in the short sentence;
Judgment module, for being deposited the candidate phrase as phrase example when the candidate phrase meets preset condition Enter phrase collocation corpus.
10. a kind of electronic equipment characterized by comprising
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one It manages device to execute, so that at least one described processor is able to carry out method of any of claims 1-8.
11. a kind of non-transitory computer-readable storage medium for being stored with computer instruction, which is characterized in that the computer refers to It enables for making the computer perform claim require method described in any one of 1-8.
12. a kind of extracting method of phrase characterized by comprising
Cutting processing is carried out to corpus text, obtains short sentence;
According to the dependence and part of speech between word each in the short sentence, candidate phrase is extracted.
CN201910831629.3A 2019-09-04 2019-09-04 Extracting method, device, electronic equipment and the storage medium of phrase Pending CN110532567A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910831629.3A CN110532567A (en) 2019-09-04 2019-09-04 Extracting method, device, electronic equipment and the storage medium of phrase

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910831629.3A CN110532567A (en) 2019-09-04 2019-09-04 Extracting method, device, electronic equipment and the storage medium of phrase

Publications (1)

Publication Number Publication Date
CN110532567A true CN110532567A (en) 2019-12-03

Family

ID=68666722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910831629.3A Pending CN110532567A (en) 2019-09-04 2019-09-04 Extracting method, device, electronic equipment and the storage medium of phrase

Country Status (1)

Country Link
CN (1) CN110532567A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126052A (en) * 2019-12-26 2020-05-08 中科鼎富(北京)科技发展有限公司 Function point generation method and device, electronic equipment and computer readable storage medium
CN111680492A (en) * 2020-06-10 2020-09-18 创新奇智(青岛)科技有限公司 New word mining method and device and electronic equipment
CN111783450A (en) * 2020-06-29 2020-10-16 中国平安人寿保险股份有限公司 Phrase extraction method and device in corpus text, storage medium and electronic equipment
CN112016298A (en) * 2020-08-28 2020-12-01 中移(杭州)信息技术有限公司 Method for extracting product characteristic information, electronic device and storage medium
CN112183089A (en) * 2020-09-25 2021-01-05 中国建设银行股份有限公司 Corpus analysis method and device, electronic equipment and storage medium
CN113177410A (en) * 2021-05-07 2021-07-27 多点(深圳)数字科技有限公司 Text word segmentation method and device, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050156A (en) * 2013-03-15 2014-09-17 富士通株式会社 Device, method and electronic equipment for extracting maximum noun phrase
CN106777275A (en) * 2016-12-29 2017-05-31 北京理工大学 Entity attribute and property value extracting method based on many granularity semantic chunks
CN107463548A (en) * 2016-06-02 2017-12-12 阿里巴巴集团控股有限公司 Short phrase picking method and device
CN107463554A (en) * 2016-06-02 2017-12-12 阿里巴巴集团控股有限公司 Short phrase picking method and device
CN108846037A (en) * 2018-05-29 2018-11-20 天津字节跳动科技有限公司 The method and apparatus of prompting search word
CN109522418A (en) * 2018-11-08 2019-03-26 杭州费尔斯通科技有限公司 A kind of automanual knowledge mapping construction method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050156A (en) * 2013-03-15 2014-09-17 富士通株式会社 Device, method and electronic equipment for extracting maximum noun phrase
CN107463548A (en) * 2016-06-02 2017-12-12 阿里巴巴集团控股有限公司 Short phrase picking method and device
CN107463554A (en) * 2016-06-02 2017-12-12 阿里巴巴集团控股有限公司 Short phrase picking method and device
CN106777275A (en) * 2016-12-29 2017-05-31 北京理工大学 Entity attribute and property value extracting method based on many granularity semantic chunks
CN108846037A (en) * 2018-05-29 2018-11-20 天津字节跳动科技有限公司 The method and apparatus of prompting search word
CN109522418A (en) * 2018-11-08 2019-03-26 杭州费尔斯通科技有限公司 A kind of automanual knowledge mapping construction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
凌勇: "《现代英语语言研究与教学》", 北京:中国商务出版社, pages: 154 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126052A (en) * 2019-12-26 2020-05-08 中科鼎富(北京)科技发展有限公司 Function point generation method and device, electronic equipment and computer readable storage medium
CN111126052B (en) * 2019-12-26 2023-11-03 鼎富智能科技有限公司 Function point generation method, device, electronic equipment and computer readable storage medium
CN111680492A (en) * 2020-06-10 2020-09-18 创新奇智(青岛)科技有限公司 New word mining method and device and electronic equipment
CN111783450A (en) * 2020-06-29 2020-10-16 中国平安人寿保险股份有限公司 Phrase extraction method and device in corpus text, storage medium and electronic equipment
CN112016298A (en) * 2020-08-28 2020-12-01 中移(杭州)信息技术有限公司 Method for extracting product characteristic information, electronic device and storage medium
CN112183089A (en) * 2020-09-25 2021-01-05 中国建设银行股份有限公司 Corpus analysis method and device, electronic equipment and storage medium
CN113177410A (en) * 2021-05-07 2021-07-27 多点(深圳)数字科技有限公司 Text word segmentation method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN110532567A (en) Extracting method, device, electronic equipment and the storage medium of phrase
Pasha et al. Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic.
CN110427627A (en) Task processing method and device based on semantic expressiveness model
CN113220836A (en) Training method and device of sequence labeling model, electronic equipment and storage medium
CN114595686B (en) Knowledge extraction method, and training method and device of knowledge extraction model
US20160078865A1 (en) Information Processing Method And Electronic Device
US20210406467A1 (en) Method and apparatus for generating triple sample, electronic device and computer storage medium
CN111310440A (en) Text error correction method, device and system
CN112784589B (en) Training sample generation method and device and electronic equipment
JP2021170394A (en) Labeling method for role, labeling device for role, electronic apparatus and storage medium
CN112269862B (en) Text role labeling method, device, electronic equipment and storage medium
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN110275938B (en) Knowledge extraction method and system based on unstructured document
CN111858880A (en) Method and device for obtaining query result, electronic equipment and readable storage medium
CN112989789B (en) Test method and device of text auditing model, computer equipment and storage medium
CN111128130B (en) Voice data processing method and device and electronic device
CN112466277A (en) Rhythm model training method and device, electronic equipment and storage medium
CN101937459A (en) Tibetan character sequencing device and method based on universal syllable structure
WO2023016163A1 (en) Method for training text recognition model, method for recognizing text, and apparatus
CN110348013A (en) Writing householder method, equipment and readable storage medium storing program for executing based on artificial intelligence
US20210382918A1 (en) Method and apparatus for labeling data
CN115510247A (en) Method, device, equipment and storage medium for constructing electric carbon policy knowledge graph
CN114443802A (en) Interface document processing method and device, electronic equipment and storage medium
CN112541346A (en) Abstract generation method and device, electronic equipment and readable storage medium
CN110516030A (en) It is intended to determination method, apparatus, equipment and the computer readable storage medium of word

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination