CN110532567A - Extracting method, device, electronic equipment and the storage medium of phrase - Google Patents
Extracting method, device, electronic equipment and the storage medium of phrase Download PDFInfo
- Publication number
- CN110532567A CN110532567A CN201910831629.3A CN201910831629A CN110532567A CN 110532567 A CN110532567 A CN 110532567A CN 201910831629 A CN201910831629 A CN 201910831629A CN 110532567 A CN110532567 A CN 110532567A
- Authority
- CN
- China
- Prior art keywords
- phrase
- word
- corpus
- dependence
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
This application discloses the extracting method of phrase, device, electronic equipment and storage mediums, are related to big data technical field.Specific implementation are as follows: cutting processing is carried out to corpus text, obtains short sentence;According to the dependence and part of speech between word each in the short sentence, candidate phrase is extracted;If the phrase meets preset condition, by candidate phrase deposit phrase collocation corpus.So as to according between word each in sentence dependence and part of speech be determined for compliance with the phrase collocation mode of part of speech combination, improve to the Phrase extraction efficiency and accuracy of corpus text.
Description
Technical field
This application involves in big data technical field intelligent search technique more particularly to a kind of extracting method of phrase,
Device, electronic equipment and storage medium.
Background technique
With the development of data processing technique, the function of intelligent search is also stronger and stronger.In addition to according to keyword search
Except related content, user can also be had a question by input tape word phrase carry out phrase collocation result search.
Currently, phrase collocation corpus is constructed in advance generally by artificial notation methods or Co-occurrence Analysis mode, when
User's input tape have a question word phrase when, search engine from phrase collocation corpus in search out it is all satisfactory short
Language.
But manpower notation methods need to expend a large amount of manpower and carry out corpus labeling, and Co-occurrence Analysis mode can not be right
Word in sentence beyond spacing distance carries out collocation analysis, therefore aforesaid way can not efficiently construct comprehensive phrase collocation
Corpus.
Summary of the invention
The application provides extracting method, device, electronic equipment and the storage medium of a kind of phrase, can be according to each in sentence
Dependence and part of speech between a word are determined for compliance with the phrase collocation mode of part of speech combination, improve the phrase to corpus text
Extraction efficiency and accuracy, consequently facilitating constructing comprehensive phrase collocation corpus.
In a first aspect, the embodiment of the present application provides a kind of extracting method of phrase, comprising:
Cutting processing is carried out to corpus text, obtains short sentence;
According to the dependence and part of speech between word each in the short sentence, candidate phrase is extracted;
If the candidate phrase meets preset condition, using the candidate phrase as the deposit phrase collocation of phrase example
Corpus.
In the present embodiment, part of speech can be determined for compliance with by the dependence and part of speech between word each in sentence
Combined phrase collocation mode, it is comprehensive to construct so as to efficiently carry out Phrase extraction processing to a large amount of corpus texts
Phrase collocation corpus provides support.
It is described that cutting processing is carried out to corpus text in a kind of possible design, obtain short sentence, comprising:
Using the punctuation mark in the corpus text as cut-off, cutting processing is carried out to the corpus text, is obtained
Short sentence.
In the present embodiment, it can be met according to the punctuate in corpus text as cut-off, i.e., by comma, the sentence in text
Number, branch, exclamation mark etc. cut-off of the segmentation symbol as cutting sentence, this mode be more in line with the text expression of sentence
Habit can reduce the data processing amount of dependence analysis between subsequent progress word.
In a kind of possible design, the dependence and part of speech according between word each in the short sentence is mentioned
Take candidate phrase, comprising:
Obtain the dependence and part of speech in the short sentence between each word;The dependence include: subject-predicate relationship,
Dynamic guest's relationship, modified relationship;The part of speech includes: pronoun, noun, verb, auxiliary word, adjective, adverbial word;
It is combined according to preset part of speech, arranges in pairs or groups to the word there are dependence, obtain candidate phrase;The part of speech
Combination includes: adjective modification noun, adverbial word modification adjective, adverbial word modification verb.
In the present embodiment, when extracting to phrase, while the dependence and part of speech of word in short sentence are considered, from
And the accuracy of Phrase extraction can be improved.
In a kind of possible design, the dependence in the phrase between each word is obtained, comprising:
Pass through the dependence between each word in short sentence described in the syntactic analysis interface analysis in natural language processing.
In the present embodiment, it can call directly each in short sentence described in the syntactic analysis interface analysis in natural language processing
Dependence between word is suitable for large batch of so as to easily and efficiently get the dependence between word
Corpus text-processing, treatment effeciency are high.
In a kind of possible design, if the candidate phrase meets preset condition, the candidate phrase is stored in
Phrase collocation corpus, comprising:
Judge whether the number that the candidate phrase occurs in corpus text is greater than preset threshold, if more than default threshold
Value is then stored in the candidate phrase as phrase example in the phrase collocation corpus.
It, can be by judging that the candidate phrase occurs in corpus text when choosing corpus text in the present embodiment
Number whether be greater than preset threshold to determine whether the candidate phrase meets preset condition, only meeting preset condition
In the case of, just using candidate phrase as phrase example.The accuracy of Phrase extraction can be improved in the design method.
In a kind of possible design, judging it is pre- whether number that the candidate phrase occurs in corpus text is greater than
If before threshold value, further includes:
Determine whether the phrase collocation of the candidate phrase is correct using artificial pattern verification;
If the frequency of occurrence that same phrases are arranged in pairs or groups in corpus text is n times, the accuracy of corresponding phrase collocation is big
In 90%, it is determined that the preset threshold is N.
In the present embodiment, the specific value of preset threshold is not limited, can be taken according to the sentence quantity and phrase of corpus
The accuracy matched is configured.It is thereby possible to select most suitable preset threshold carries out candidate phrase as decision condition
Screening Treatment improves the extraction efficiency of phrase under the premise of guaranteeing accuracy.
In a kind of possible design, further includes:
Receive the retrieval phrase comprising default interrogative of user's input;
Genitive phrase example corresponding with the retrieval phrase is searched from phrase collocation corpus.
In the present embodiment, the retrieval phrase comprising default interrogative that can be inputted according to user, rapidly to user's end
Feed back genitive phrase example corresponding with the retrieval phrase in end.
In a kind of possible design, institute corresponding with the retrieval phrase is being searched from phrase collocation corpus
After having a phrase example, further includes:
According to the frequency that the phrase example is retrieved, the phrase example is ranked up;
The phrase example is shown according to ranking results.
In the present embodiment, the phrase example found can be ranked up, to recommend the retrieval frequency high to user
Phrase example is provided and is preferably referred to for the phrase collocation of user.
Second aspect, the embodiment of the present application provide a kind of extraction element of phrase, comprising:
Cutting module obtains short sentence for carrying out cutting processing to corpus text;
Extraction module, for extracting candidate phrase according to the dependence and part of speech between word each in the short sentence;
Judgment module, for when the candidate phrase meets preset condition, using the candidate phrase as phrase reality
Example deposit phrase collocation corpus.
In the present embodiment, part of speech can be determined for compliance with by the dependence and part of speech between word each in sentence
Combined phrase collocation mode, it is comprehensive to construct so as to efficiently carry out Phrase extraction processing to a large amount of corpus texts
Phrase collocation corpus provides support.
In a kind of possible design, the cutting module is specifically used for:
Using the punctuation mark in the corpus text as cut-off, cutting processing is carried out to the corpus text, is obtained
Short sentence.
In the present embodiment, it can be met according to the punctuate in corpus text as cut-off, i.e., by comma, the sentence in text
Number, branch, exclamation mark etc. cut-off of the segmentation symbol as cutting sentence, this mode be more in line with the text expression of sentence
Habit can reduce the data processing amount of dependence analysis between subsequent progress word.
In a kind of possible design, the extraction module is specifically used for:
Obtain the dependence and part of speech in the short sentence between each word;The dependence include: subject-predicate relationship,
Dynamic guest's relationship, modified relationship;The part of speech includes: pronoun, noun, verb, auxiliary word, adjective, adverbial word;
It is combined according to preset part of speech, arranges in pairs or groups to the word there are dependence, obtain candidate phrase;The part of speech
Combination includes: adjective modification noun, adverbial word modification adjective, adverbial word modification verb.
In the present embodiment, when extracting to phrase, while the dependence and part of speech of word in short sentence are considered, from
And the accuracy of Phrase extraction can be improved.
In a kind of possible design, the dependence in the phrase between each word is obtained, comprising:
Pass through the dependence between each word in short sentence described in the syntactic analysis interface analysis in natural language processing.
In the present embodiment, it can call directly each in short sentence described in the syntactic analysis interface analysis in natural language processing
Dependence between word is suitable for large batch of so as to easily and efficiently get the dependence between word
Corpus text-processing, treatment effeciency are high.
In a kind of possible design, the judgment module is specifically used for:
Judge whether the number that the candidate phrase occurs in corpus text is greater than preset threshold, if more than default threshold
Value is then stored in the candidate phrase as phrase example in the phrase collocation corpus.
It, can be by judging that the candidate phrase occurs in corpus text when choosing corpus text in the present embodiment
Number whether be greater than preset threshold to determine whether the candidate phrase meets preset condition, only meeting preset condition
In the case of, just using candidate phrase as phrase example.The accuracy of Phrase extraction can be improved in the design method.
In a kind of possible design, further includes: determining module is used for:
Determine whether the phrase collocation of the candidate phrase is correct using artificial pattern verification;
If the frequency of occurrence that same phrases are arranged in pairs or groups in corpus text is n times, the accuracy of corresponding phrase collocation is big
In 90%, it is determined that the preset threshold is N.
In the present embodiment, the specific value of preset threshold is not limited, can be taken according to the sentence quantity and phrase of corpus
The accuracy matched is configured.It is thereby possible to select most suitable preset threshold carries out candidate phrase as decision condition
Screening Treatment improves the extraction efficiency of phrase under the premise of guaranteeing accuracy.
In a kind of possible design, further includes:
Receiving module, for receiving the retrieval phrase comprising default interrogative of user's input;
Enquiry module, it is real for searching genitive phrase corresponding with the retrieval phrase from phrase collocation corpus
Example.
In the present embodiment, the retrieval phrase comprising default interrogative that can be inputted according to user, rapidly to user's end
Feed back genitive phrase example corresponding with the retrieval phrase in end.
In a kind of possible design, further includes: display module is used for:
According to the frequency that the phrase example is retrieved, the phrase example is ranked up;
The phrase example is shown according to ranking results.
In the present embodiment, the phrase example found can be ranked up, to recommend the retrieval frequency high to user
Phrase example is provided and is preferably referred to for the phrase collocation of user.
The third aspect, the application provide a kind of electronic equipment, comprising: processor and memory;It is stored in memory
State the executable instruction of processor;Wherein, the processor is configured to execute such as first via the executable instruction is executed
The extracting method of phrase described in any one of aspect.
Fourth aspect, the application provide a kind of computer readable storage medium, are stored thereon with computer program, the program
The extracting method of phrase described in any one of first aspect is realized when being executed by processor.
5th aspect, the embodiment of the present application provide a kind of program product, and described program product includes: computer program, institute
It states computer program to be stored in readable storage medium storing program for executing, at least one processor of server can be from the readable storage medium storing program for executing
The computer program is read, at least one described processor executes the computer program and server is made to execute first aspect
In any phrase extracting method.
6th aspect, the embodiment of the present application also provide a kind of extracting method of phrase, comprising:
Cutting processing is carried out to corpus text, obtains short sentence;
According to the dependence and part of speech between word each in the short sentence, candidate phrase is extracted.
In the present embodiment, part of speech can be determined for compliance with by the dependence and part of speech between word each in sentence
Combined phrase collocation mode, so as to efficiently carry out Phrase extraction processing to a large amount of corpus texts.
One embodiment in above-mentioned application has the following advantages that or the utility model has the advantages that can carry out at cutting to corpus text
Reason, obtains short sentence;According to the dependence and part of speech between word each in the short sentence, candidate phrase is extracted;If described short
Language meets preset condition, then by the technological means of candidate phrase deposit phrase collocation corpus, so overcoming existing short
The low technical problem of language extraction efficiency, and then reach to improve and the Phrase extraction efficiency of corpus text and the technology of accuracy are imitated
Fruit.
Other effects possessed by above-mentioned optional way are illustrated hereinafter in conjunction with specific embodiment.
Detailed description of the invention
Attached drawing does not constitute the restriction to the application for more fully understanding this programme.Wherein:
Fig. 1 can be achieved on the scene figure of the extraction of the phrase of the embodiment of the present application;
Fig. 2 is the schematic diagram according to the application first embodiment;
Fig. 3 is the schematic illustration according to the application syntactic analysis;
Fig. 4 is the schematic diagram according to the application second embodiment;
Fig. 5 is the schematic diagram according to the application 3rd embodiment;
Fig. 6 is the schematic diagram according to the application fourth embodiment;
Fig. 7 is the schematic diagram according to the 5th embodiment of the application;
Fig. 8 is the block diagram for the electronic equipment for the extracting method for realizing the phrase of the embodiment of the present application.
Specific embodiment
It explains below in conjunction with exemplary embodiment of the attached drawing to the application, including the various of the embodiment of the present application
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from the scope and spirit of the present application.Together
Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
It explains below in conjunction with exemplary embodiment of the attached drawing to the application, including the various of the embodiment of the present application
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from the scope and spirit of the present application.Together
Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
The description and claims of this application and term " first ", " second ", " third ", " in above-mentioned attached drawing
The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage
The data that solution uses in this way are interchangeable under appropriate circumstances, so that embodiments herein described herein for example can be to remove
Sequence other than those of illustrating or describe herein is implemented.In addition, term " includes " and " having " and theirs is any
Deformation, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, production
Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for this
A little process, methods, the other step or units of product or equipment inherently.
It is described in detail below with technical solution of the specifically embodiment to the application.These specific implementations below
Example can be combined with each other, and the same or similar concept or process may be repeated no more in some embodiments.
With the development of data processing technique, the function of intelligent search is also stronger and stronger.In addition to according to keyword search
Except related content, user can also be had a question by input tape word phrase carry out phrase collocation result search.Currently, one
As be constructed in advance by artificial notation methods or Co-occurrence Analysis mode phrase collocation corpus, when user input with doubt
When asking the phrase of word, search engine searches out all satisfactory phrases from phrase collocation corpus.But manpower marks
Mode needs to expend a large amount of manpower and carries out corpus labeling, and Co-occurrence Analysis mode can not be to the word for exceeding spacing distance in sentence
Language carries out collocation analysis, therefore aforesaid way can not efficiently construct comprehensive phrase collocation corpus.
In view of the above technical problems, extracting method, device, electronic equipment and the storage that the application provides a kind of phrase are situated between
Matter, can according between word each in sentence dependence and part of speech be determined for compliance with part of speech combination phrase collocation mode,
The Phrase extraction efficiency and accuracy to corpus text are improved, consequently facilitating constructing comprehensive phrase collocation corpus.
Fig. 1 can be achieved on the scene figure of the extraction of the phrase of the embodiment of the present application, as shown in Figure 1, Phrase extraction device
It may include: cutting module, extraction module, judgment module.The profession that phrase occurs in different corpus texts is biased to different.Cause
This, can in order to construct professional domain phrase arrange in pairs or groups corpus, select the data of the professional domain as corpus text
Source, for example, by using the technology books, periodical, paper etc. in the field.It may be guarantee corpus multiplicity as far as possible and expressed profession,
Select general data as the source of corpus text, for example, by using People's Daily's data etc..For corpus text, first with cutting
Sub-module naturally makes pauses in reading unpunctuated ancient writings method according to Chinese language that is, will be funny in text using the punctuation mark in corpus text as cut-off
Number, fullstop, branch, exclamation mark etc. cut-off of the segmentation symbol as cutting sentence, cutting processing is carried out to corpus text, is obtained
To short sentence.This mode is more in line with the text expression habit of sentence, can reduce dependence point between subsequent progress word
The data processing amount of analysis.Then, extraction module is main foundation according to grammar property and takes into account lexical meaning, is obtained each in short sentence
Then dependence and part of speech between a word are combined according to preset part of speech, are taken to the word there are dependence
Match, obtains candidate phrase.During specific implementation, the syntactic analysis interface analysis short sentence in natural language processing can be passed through
In dependence between each word be suitable for big so as to easily and efficiently get the dependence between word
The corpus text-processing of batch, treatment effeciency are high.Since the accuracy rate of syntactic analysis interface is not 100%, the analysis of simple sentence can
It can there is a situation where mistake, it is therefore desirable to candidate phrase be confirmed by judgment module, and met in candidate phrase preset
When condition, using candidate phrase as phrase example deposit phrase collocation corpus.It, can be according to identical during specific implementation
Whether frequency of occurrence of arranging in pairs or groups is more than threshold value to determine whether the phrase can be used.If more than preset threshold, then using candidate phrase as
Phrase example is stored in phrase collocation corpus.It should be noted that the present embodiment does not limit the specific value of preset threshold, this
The size of threshold value can be rationally arranged in field technical staff according to the actual situation.Meanwhile the present embodiment does not also limit preset threshold
Method of determination, can using be manually set, different threshold values different type can also be arranged by algorithm.Optionally, In
Before judging whether number that candidate phrase occurs in corpus text is greater than preset threshold, further includes: using artificial verification side
Formula determines whether the phrase collocation of candidate phrase is correct;If the frequency of occurrence that same phrases are arranged in pairs or groups in corpus text is n times,
The accuracy of corresponding phrase collocation is greater than 90%, it is determined that preset threshold N.
Using the above method can according between word each in sentence dependence and part of speech be determined for compliance with part of speech group
The phrase collocation mode of conjunction improves Phrase extraction efficiency and accuracy to corpus text, consequently facilitating constructing comprehensive phrase
Collocation corpus.
Fig. 2 is according to the schematic diagram of the application first embodiment, as shown in Fig. 2, the method in the present embodiment may include:
S101, cutting processing is carried out to corpus text, obtains short sentence.
In the present embodiment, can naturally make pauses in reading unpunctuated ancient writings method according to Chinese language, using the punctuation mark in corpus text as cutting
Point, i.e., using in text comma, fullstop, branch, exclamation mark etc. segmentation symbol as the cut-off of cutting sentence, to corpus text
This progress cutting processing, obtains short sentence.This mode is more in line with the text expression habit of sentence, can reduce subsequent carry out word
The data processing amount that dependence is analyzed between language.
It should be noted that the source of the present embodiment not qualifier material text, phrase occurs in different corpus texts
Profession is biased to different.Can in order to construct professional domain phrase arrange in pairs or groups corpus, select the data of the professional domain as language
The source for expecting text, for example, by using the technology books, periodical, paper etc. in the field.It may be that guarantee corpus is various as far as possible
Property and expression it is professional, select general data as the source of corpus text, for example, by using People's Daily's data etc..
S102, according to the dependence and part of speech between word each in short sentence, extract candidate phrase.
In the present embodiment, dependence and part of speech in available short sentence between each word, then according to preset
Part of speech combination, arranges in pairs or groups to the word there are dependence, obtains candidate phrase.Wherein, part of speech is to be according to grammar property
Main foundation takes into account the result that lexical meaning divides word, comprising: pronoun, noun, verb, auxiliary word, adjective, adverbial word,
Number, preposition etc.;Dependence be mutually dominate between sentence element with is dominated, it is interdependent and by interdependent relationship, comprising: subject-predicate
Relationship, dynamic guest's relationship, modified relationship, relationship in shape, it is fixed in relationship etc.;Part of speech combination refers in object phrase collocation corpus
Dependence between sentence element, comprising: adjective modification noun, adverbial word modification adjective, adverbial word modification verb etc..To word
Between dependence and part of speech acquisition, can be obtained, can also be realized by algorithm by the way of manually marking.This reality
It applies in example, when extracting to phrase, while considering the dependence and part of speech of word in short sentence, it is short so as to improve
The accuracy that language extracts.
It is alternatively possible to by between each word in the syntactic analysis interface analysis short sentence in natural language processing according to
The relationship of relying is suitable for large batch of corpus text-processing so as to easily and efficiently get the dependence between word,
Treatment effeciency is high.
Specifically, it calls the syntactic analysis interface in natural language processing to carry out syntactic analysis to an above-mentioned result of cutting, obtains
Take the part of speech and dependence of each ingredient in sentence.Fig. 3 is according to the schematic illustration of the application syntactic analysis, such as Fig. 3 institute
Show, for " white clouds drift in the clear and bright sky." this short sentence (being indicated with ROOT), syntactic analysis interface is available each
Part of speech between word, it be preposition (being indicated with p), " bright and clean " is shape that obtain " white clouds ", which be inherent noun (being indicated with nr), " ",
Hold word (being indicated with a), " " be auxiliary word (being indicated with u), " sky " be noun (being indicated with n), " in " be adverbial word (being indicated with f),
" drifting " be verb (being indicated with v), "." it is punctuation mark (being indicated with w).Syntactic analysis interface is also between available word
Dependence obtains " white clouds drift " composition subject-predicate relationship (being indicated with SBV), " drifting " constitutes relationship in shape and (use ADV table
Show), " it is aerial " constitute relationship (being indicated with ATT) in fixed, " bright and clean " composition " " word structure (being indicated with DE), " sky "
Constitute " " word structure (being indicated with DE).By syntactic analysis interface, available " white clouds drift " is that the core of the sentence (is used
HED is indicated).Then, according to the common collocation form of object phrase, " azure sky " as common in polarization phrase (is described
Word modification noun), " in the extreme outstanding " (adverbial word modification adjective), " stammering out " (adverbial word modification verb) etc., can
To be determined according to the DE relationship combination part of speech in figure;Dynamic guest's phrase (verb collocation noun) can be according to dependence with before
Part of speech afterwards is determined.The present embodiment, can be by syntactic analysis as a result, accurate obtain each ingredient in sentence
Then part of speech and dependence obtain phrase example according to the common part of speech collocation of object phrase, can greatly save mark
Human cost, while also solving the problems, such as in simple Co-occurrence Analysis method collocation apart from too long and fail.
If S103, candidate phrase meet preset condition, phrase associated language is stored in using candidate phrase as phrase example
Expect library.
In the present embodiment, it can be determined that whether the number that candidate phrase occurs in corpus text is greater than preset threshold, if
Greater than preset threshold, then using candidate phrase as in phrase example deposit phrase collocation corpus.
Specifically, the accuracy rate of syntactic analysis interface is not 100%, so there may be the feelings of mistake for the analysis of simple sentence
Condition needs to determine whether the phrase can be used according to whether identical collocation frequency of occurrence is more than threshold value.It should be noted that this reality
The specific value that example does not limit preset threshold is applied, the big of threshold value can be rationally arranged in those skilled in the art according to the actual situation
It is small.Meanwhile the present embodiment does not limit the method for determination of preset threshold yet, can also pass through algorithm pair using being manually set
Different threshold values is arranged in different type.
Optionally, it before judging whether number that candidate phrase occurs in corpus text is greater than preset threshold, also wraps
It includes: determining whether the phrase collocation of candidate phrase is correct using artificial pattern verification;If same phrases are arranged in pairs or groups in corpus text
Frequency of occurrence when being n times, the accuracy of corresponding phrase collocation is greater than 90%, it is determined that preset threshold N.
It should be noted that the specific value of preset threshold is not limited in the present embodiment, it can be according to the sentence of corpus
The accuracy of quantity and phrase collocation is configured.
The present embodiment obtains short sentence by carrying out cutting processing to corpus text;According between word each in short sentence
Dependence and part of speech extract candidate phrase;If phrase meets preset condition, by candidate phrase deposit phrase collocation corpus
Library.So as to according between word each in sentence dependence and part of speech be determined for compliance with part of speech combination phrase collocation side
Formula improves Phrase extraction efficiency and accuracy to corpus text, consequently facilitating constructing comprehensive phrase collocation corpus.
Fig. 4 is the schematic diagram according to the application second embodiment;As shown in figure 4, the method in the present embodiment may include:
S201, cutting processing is carried out to corpus text, obtains short sentence.
S202, according to the dependence and part of speech between word each in short sentence, extract candidate phrase.
If S203, candidate phrase meet preset condition, phrase associated language is stored in using candidate phrase as phrase example
Expect library.
Step S201~step S203 specific implementation process and realization principle method shown in Figure 2 in the present embodiment
In associated description, details are not described herein again.
S204, the retrieval phrase comprising default interrogative for receiving user's input.
In the present embodiment, the retrieval phrase comprising default interrogative of user's input can also be received, to meet word
Class of filling a vacancy Search Requirement.The retrieval phrase of input, such as " beautiful what ", " what sky ", " what is cleared up " etc..
S205, genitive phrase example corresponding with retrieval phrase is searched from phrase collocation corpus.
In the present embodiment, corresponding phrase can be searched from phrase collocation corpus according to the keyword of retrieval phrase
Example.For example, retrieving in phrase collocation corpus when " beautiful what " retrieves phrase is, phrase example such as beauty is returned
Beautiful (scenery), beautiful (local), beautiful (the Nature), beautiful (campus), beautiful (grassland), beautiful (butterfly
Butterfly knot), beautiful (landscape painting), beautiful (landscape), beautiful (spring), beautiful (soul) etc..
Optionally, it after searching genitive phrase example corresponding with retrieval phrase in phrase collocation corpus, also wraps
Include: the frequency being retrieved according to phrase example is ranked up phrase example;Phrase example is shown according to ranking results.
Specifically, the frequency that phrase example is retrieved and selects reflects the accuracy and general degree of phrase example.Cause
This, from high to low can be ranked up phrase example according to the frequency that phrase example is retrieved and selects, by the short of high frequency time
Language example comes front, to facilitate user preferentially to select.
The present embodiment obtains short sentence by carrying out cutting processing to corpus text;According between word each in short sentence
Dependence and part of speech extract candidate phrase;If phrase meets preset condition, by candidate phrase deposit phrase collocation corpus
Library.So as to according between word each in sentence dependence and part of speech be determined for compliance with part of speech combination phrase collocation side
Formula improves Phrase extraction efficiency and accuracy to corpus text, consequently facilitating constructing comprehensive phrase collocation corpus.
In addition, the present embodiment, can also receive the retrieval phrase comprising default interrogative of user's input.Then, from short
Language, which is arranged in pairs or groups, to be searched in corpus and retrieves the corresponding genitive phrase example of phrase, is filled a vacancy class Search Requirement to meet word.
Fig. 5 is the schematic diagram according to the application 3rd embodiment;As shown in figure 5, the method in the present embodiment may include:
S301, cutting processing is carried out to corpus text, obtains short sentence.
S302, according to the dependence and part of speech between word each in short sentence, extract candidate phrase.
Step S301~step S302 specific implementation process and realization principle method shown in Figure 2 in the present embodiment
In associated description, details are not described herein again.
The present embodiment obtains short sentence by carrying out cutting processing to corpus text;According between word each in short sentence
Dependence and part of speech extract candidate phrase.So as to according between word each in sentence dependence and part of speech it is true
Surely meet the phrase collocation mode of part of speech combination, improve Phrase extraction efficiency and accuracy to corpus text.
Using the method in above-mentioned embodiment illustrated in fig. 5, the rapidly extracting to phrase may be implemented.Such as the corpus of input
Text are as follows: " white clouds drift in the clear and bright sky ", the then phrase extracted are " the clear and bright sky ".When user is when retrieving phrase
Input " what sky ", then search engine can phrase arrange in pairs or groups search in corpus it is all triliteral short
Language, such as " the clear and bright sky " this phrase is fed back into user terminal.
Fig. 6 is the schematic diagram according to the application fourth embodiment;As shown in fig. 6, the device in the present embodiment may include:
Cutting module 31 obtains short sentence for carrying out cutting processing to corpus text;
Extraction module 32, for extracting candidate phrase according to the dependence and part of speech between word each in short sentence;
Judgment module 33, for being stored in candidate phrase as phrase example when candidate phrase meets preset condition
Phrase collocation corpus.
In the present embodiment, part of speech can be determined for compliance with by the dependence and part of speech between word each in sentence
Combined phrase collocation mode, it is comprehensive to construct so as to efficiently carry out Phrase extraction processing to a large amount of corpus texts
Phrase collocation corpus provides support.
In a kind of possible design, cutting module 31 is specifically used for:
Using the punctuation mark in corpus text as cut-off, cutting processing is carried out to corpus text, obtains short sentence.
In the present embodiment, it can be met according to the punctuate in corpus text as cut-off, i.e., by comma, the sentence in text
Number, branch, exclamation mark etc. cut-off of the segmentation symbol as cutting sentence, this mode be more in line with the text expression of sentence
Habit can reduce the data processing amount of dependence analysis between subsequent progress word.
In a kind of possible design, extraction module 32 is specifically used for:
Obtain the dependence and part of speech in short sentence between each word;Dependence includes: subject-predicate relationship, dynamic guest pass
System, modified relationship;Part of speech includes: pronoun, noun, verb, auxiliary word, adjective, adverbial word;
It is combined according to preset part of speech, arranges in pairs or groups to the word there are dependence, obtain candidate phrase;Part of speech combination
It include: adjective modification noun, adverbial word modification adjective, adverbial word modification verb.
In the present embodiment, when extracting to phrase, while the dependence and part of speech of word in short sentence are considered, from
And the accuracy of Phrase extraction can be improved.
In a kind of possible design, the dependence in phrase between each word is obtained, comprising:
Pass through the dependence between each word in the syntactic analysis interface analysis short sentence in natural language processing.
In the present embodiment, each word in the syntactic analysis interface analysis short sentence in natural language processing can be called directly
Between dependence, so as to easily and efficiently get the dependence between word, be suitable for large batch of corpus
Text-processing, treatment effeciency are high.
In a kind of possible design, judgment module 33 is specifically used for:
Judge whether the number that candidate phrase occurs in corpus text is greater than preset threshold, if more than preset threshold, then
Using candidate phrase as in phrase example deposit phrase collocation corpus.
It, can be by judging that candidate phrase occur secondary in corpus text when choosing corpus text in the present embodiment
Whether number, which is greater than preset threshold, determines whether candidate phrase meets preset condition, only in the case where meeting preset condition,
Just using candidate phrase as phrase example.The accuracy of Phrase extraction can be improved in the design method.
The extraction element of the phrase of the present embodiment can execute the technical solution in method shown in Fig. 2, Fig. 5, specific real
The existing associated description of process and technical principle referring to fig. 2, in method shown in Fig. 5, details are not described herein again.
The present embodiment obtains short sentence by carrying out cutting processing to corpus text;According between word each in short sentence
Dependence and part of speech extract candidate phrase;If phrase meets preset condition, by candidate phrase deposit phrase collocation corpus
Library.So as to according between word each in sentence dependence and part of speech be determined for compliance with part of speech combination phrase collocation side
Formula improves Phrase extraction efficiency and accuracy to corpus text, consequently facilitating constructing comprehensive phrase collocation corpus.
Fig. 7 is the schematic diagram according to the 5th embodiment of the application;As shown in fig. 7, the device in the present embodiment is shown in Fig. 6
On the basis of device, can also include:
Determining module 34, is used for:
Determine whether the phrase collocation of candidate phrase is correct using artificial pattern verification;
If the frequency of occurrence that same phrases are arranged in pairs or groups in corpus text is n times, the accuracy of corresponding phrase collocation is big
In 90%, it is determined that preset threshold N.
In the present embodiment, the specific value of preset threshold is not limited, can be taken according to the sentence quantity and phrase of corpus
The accuracy matched is configured.It is thereby possible to select most suitable preset threshold carries out candidate phrase as decision condition
Screening Treatment improves the extraction efficiency of phrase under the premise of guaranteeing accuracy.
In a kind of possible design, further includes:
Receiving module 35, for receiving the retrieval phrase comprising default interrogative of user's input;
Enquiry module 36, for searching genitive phrase example corresponding with retrieval phrase from phrase collocation corpus.
In the present embodiment, the retrieval phrase comprising default interrogative that can be inputted according to user, rapidly to user's end
End feedback genitive phrase example corresponding with retrieval phrase.
In a kind of possible design, further includes: display module 37 is used for:
According to the frequency that phrase example is retrieved, phrase example is ranked up;
Phrase example is shown according to ranking results.
In the present embodiment, the phrase example found can be ranked up, to recommend the retrieval frequency high to user
Phrase example is provided and is preferably referred to for the phrase collocation of user.
The extraction element of the phrase of the present embodiment can execute the technical solution in method shown in Fig. 2, Fig. 4, Fig. 5, tool
Body realizes the associated description of process and technical principle referring to fig. 2, in method shown in Fig. 4, Fig. 5, and details are not described herein again.
The present embodiment obtains short sentence by carrying out cutting processing to corpus text;According between word each in short sentence
Dependence and part of speech extract candidate phrase;If phrase meets preset condition, by candidate phrase deposit phrase collocation corpus
Library.So as to according between word each in sentence dependence and part of speech be determined for compliance with part of speech combination phrase collocation side
Formula improves Phrase extraction efficiency and accuracy to corpus text, consequently facilitating constructing comprehensive phrase collocation corpus.
In addition, the present embodiment can also receive the retrieval phrase comprising default interrogative of user's input.Then, from phrase
It arranges in pairs or groups and searches in corpus and retrieve the corresponding genitive phrase example of phrase, fill a vacancy class Search Requirement to meet word.
According to an embodiment of the present application, present invention also provides a kind of electronic equipment and a kind of readable storage medium storing program for executing.
Fig. 8 is the block diagram for the electronic equipment for the extracting method for realizing the phrase of the embodiment of the present application;As shown in figure 8,
It is the block diagram according to the electronic equipment of the extracting method of the phrase of the embodiment of the present application.Electronic equipment is intended to indicate that various forms of
Digital computer, such as, laptop computer, desktop computer, workbench, personal digital assistant, server, blade type service
Device, mainframe computer and other suitable computer.Electronic equipment also may indicate that various forms of mobile devices, such as, a
Number word processing, cellular phone, smart phone, wearable device and other similar computing devices.Component shown in this article, it
Connection and relationship and their function it is merely exemplary, and be not intended to limit described herein and/or want
The realization of the application asked.
As shown in figure 8, the electronic equipment includes: one or more processors 501, memory 502, and each for connecting
The interface of component, including high-speed interface and low-speed interface.All parts are interconnected using different buses, and can be pacified
It installs in other ways on public mainboard or as needed.Processor can to the instruction executed in electronic equipment into
Row processing, including storage in memory or on memory (such as, to be coupled to interface in external input/output device
Display equipment) on show GUI graphical information instruction.In other embodiments, if desired, can be by multiple processors
And/or multiple bus is used together with multiple memories with multiple memories.It is also possible to multiple electronic equipments are connected, it is each
Equipment provides the necessary operation in part (for example, as server array, one group of blade server or multiprocessor system
System).In Fig. 8 by taking a processor 501 as an example.
Memory 502 is non-transitory computer-readable storage medium provided herein.Wherein, memory is stored with
The instruction that can be executed by least one processor, so that at least one processor executes the extraction side of phrase provided herein
Method.The non-transitory computer-readable storage medium of the application stores computer instruction, and the computer instruction is for holding computer
The extracting method of row phrase provided herein.
Memory 502 is used as a kind of non-transitory computer-readable storage medium, can be used for storing non-instantaneous software program, non-
Instantaneous computer executable program and module, as the corresponding program instruction of the extracting method of the phrase in the embodiment of the present application/
Module.Non-instantaneous software program, instruction and the module that processor 501 is stored in memory 502 by operation, thereby executing
The various function application and data processing of server, the i.e. extracting method of phrase in realization above method embodiment.
Memory 502 may include storing program area and storage data area, wherein storing program area can store operation system
Application program required for system, at least one function;Storage data area can store the electronic equipment of the extracting method according to phrase
Use created data etc..In addition, memory 502 may include high-speed random access memory, it can also include non-wink
When memory, a for example, at least disk memory, flush memory device or other non-instantaneous solid-state memories.In some realities
It applies in example, optional memory 502 includes the memory remotely located relative to processor 501, these remote memories can lead to
Cross the electronic equipment being connected to the network to the extracting method of phrase.The example of above-mentioned network includes but is not limited to internet, in enterprise
Portion's net, local area network, mobile radio communication and combinations thereof.
The electronic equipment of the extracting method of phrase can also include: input unit 503 and output device 504.Processor
501, memory 502, input unit 503 and output device 504 can be connected by bus or other modes, with logical in Fig. 8
It crosses for bus connection.
Input unit 503 can receive the number or character information of input, and generate the electronics with the extracting method of phrase
The related key signals input of the user setting and function control of equipment, such as touch screen, keypad, mouse, track pad, touch
The input units such as plate, indicating arm, one or more mouse button, trace ball, control stick.Output device 504 may include showing
Show equipment, auxiliary lighting apparatus (for example, LED) and haptic feedback devices (for example, vibrating motor) etc..The display equipment can wrap
It includes but is not limited to, liquid crystal display (LCD), light emitting diode (LED) display and plasma scope.In some embodiment party
In formula, display equipment can be touch screen.
The various embodiments of system and technology described herein can be in digital electronic circuitry, integrated circuit system
It is realized in system, dedicated ASIC (specific integrated circuit), computer hardware, firmware, software, and/or their combination.These are various
Embodiment may include: to implement in one or more computer program, which can be
It executes and/or explains in programmable system containing at least one programmable processor, which can be dedicated
Or general purpose programmable processors, number can be received from storage system, at least one input unit and at least one output device
According to and instruction, and data and instruction is transmitted to the storage system, at least one input unit and this at least one output
Device.
These calculation procedures (also referred to as program, software, software application or code) include the machine of programmable processor
Instruction, and can use programming language, and/or the compilation/machine language of level process and/or object-oriented to implement these
Calculation procedure.As used herein, term " machine readable media " and " computer-readable medium " are referred to for referring to machine
It enables and/or data is supplied to any computer program product, equipment, and/or the device of programmable processor (for example, disk, light
Disk, memory, programmable logic device (PLD)), including, receive the machine readable of the machine instruction as machine-readable signal
Medium.Term " machine-readable signal " is referred to for machine instruction and/or data to be supplied to any of programmable processor
Signal.
In order to provide the interaction with user, system and technology described herein, the computer can be implemented on computers
The display device for showing information to user is included (for example, CRT (cathode-ray tube) or LCD (liquid crystal display) monitoring
Device);And keyboard and indicator device (for example, mouse or trace ball), user can by the keyboard and the indicator device come
Provide input to computer.The device of other types can be also used for providing the interaction with user;For example, being supplied to user's
Feedback may be any type of sensory feedback (for example, visual feedback, audio feedback or touch feedback);And it can use
Any form (including vocal input, voice input or tactile input) receives input from the user.
System described herein and technology can be implemented including the computing system of background component (for example, as data
Server) or the computing system (for example, application server) including middleware component or the calculating including front end component
System is (for example, the subscriber computer with graphic user interface or web browser, user can pass through graphical user circle
Face or the web browser to interact with the embodiment of system described herein and technology) or including this backstage portion
In any combination of computing system of part, middleware component or front end component.Any form or the number of medium can be passed through
Digital data communicates (for example, communication network) and is connected with each other the component of system.The example of communication network includes: local area network
(LAN), wide area network (WAN) and internet.
Computer system may include client and server.Client and server is generally off-site from each other and usually logical
Communication network is crossed to interact.By being run on corresponding computer and each other with the meter of client-server relation
Calculation machine program generates the relationship of client and server.
It should be understood that various forms of processes illustrated above can be used, rearrangement increases or deletes step.Example
Such as, each step recorded in the application of this hair can be performed in parallel or be sequentially performed the order that can also be different and execute,
As long as it is desired as a result, being not limited herein to can be realized technical solution disclosed in the present application.
Above-mentioned specific embodiment does not constitute the limitation to the application protection scope.Those skilled in the art should be bright
White, according to design requirement and other factors, various modifications can be carried out, combination, sub-portfolio and substitution.It is any in the application
Spirit and principle within made modifications, equivalent substitutions and improvements etc., should be included within the application protection scope.
Claims (12)
1. a kind of extracting method of phrase characterized by comprising
Cutting processing is carried out to corpus text, obtains short sentence;
According to the dependence and part of speech between word each in the short sentence, candidate phrase is extracted;
If the candidate phrase meets preset condition, using the candidate phrase as phrase example deposit phrase collocation corpus
Library.
2. short sentence is obtained the method according to claim 1, wherein described carry out cutting processing to corpus text,
Include:
Using the punctuation mark in the corpus text as cut-off, cutting processing is carried out to the corpus text, obtains short sentence.
3. the method according to claim 1, wherein the dependence according between word each in the short sentence
Relationship and part of speech extract candidate phrase, comprising:
Obtain the dependence and part of speech in the short sentence between each word;The dependence includes: subject-predicate relationship, dynamic guest
Relationship, modified relationship;The part of speech includes: pronoun, noun, verb, auxiliary word, adjective, adverbial word;
It is combined according to preset part of speech, arranges in pairs or groups to the word there are dependence, obtain candidate phrase;The part of speech combination
It include: adjective modification noun, adverbial word modification adjective, adverbial word modification verb.
4. according to the method described in claim 3, it is characterized in that, the dependence obtained in the phrase between each word is closed
System, comprising:
Pass through the dependence between each word in short sentence described in the syntactic analysis interface analysis in natural language processing.
5. the method according to claim 1, wherein if the candidate phrase meets preset condition, by institute
State candidate phrase deposit phrase collocation corpus, comprising:
Judge whether the number that the candidate phrase occurs in corpus text is greater than preset threshold, if more than preset threshold, then
It is stored in the candidate phrase as phrase example in the phrase collocation corpus.
6. according to the method described in claim 5, it is characterized in that, judging what the candidate phrase occurred in corpus text
Whether number is greater than before preset threshold, further includes:
Determine whether the phrase collocation of the candidate phrase is correct using artificial pattern verification;
If the frequency of occurrence that same phrases are arranged in pairs or groups in corpus text is n times, the accuracy of corresponding phrase collocation is greater than
90%, it is determined that the preset threshold is N.
7. method according to claim 1 to 6, which is characterized in that further include:
Receive the retrieval phrase comprising default interrogative of user's input;
Genitive phrase example corresponding with the retrieval phrase is searched from phrase collocation corpus.
8. the method according to the description of claim 7 is characterized in that being searched and the inspection from phrase collocation corpus
After the corresponding genitive phrase example of rope phrase, further includes:
According to the frequency that the phrase example is retrieved, the phrase example is ranked up;
The phrase example is shown according to ranking results.
9. a kind of extraction element of phrase characterized by comprising
Cutting module obtains short sentence for carrying out cutting processing to corpus text;
Extraction module, for extracting candidate phrase according to the dependence and part of speech between word each in the short sentence;
Judgment module, for being deposited the candidate phrase as phrase example when the candidate phrase meets preset condition
Enter phrase collocation corpus.
10. a kind of electronic equipment characterized by comprising
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one
It manages device to execute, so that at least one described processor is able to carry out method of any of claims 1-8.
11. a kind of non-transitory computer-readable storage medium for being stored with computer instruction, which is characterized in that the computer refers to
It enables for making the computer perform claim require method described in any one of 1-8.
12. a kind of extracting method of phrase characterized by comprising
Cutting processing is carried out to corpus text, obtains short sentence;
According to the dependence and part of speech between word each in the short sentence, candidate phrase is extracted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910831629.3A CN110532567A (en) | 2019-09-04 | 2019-09-04 | Extracting method, device, electronic equipment and the storage medium of phrase |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910831629.3A CN110532567A (en) | 2019-09-04 | 2019-09-04 | Extracting method, device, electronic equipment and the storage medium of phrase |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110532567A true CN110532567A (en) | 2019-12-03 |
Family
ID=68666722
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910831629.3A Pending CN110532567A (en) | 2019-09-04 | 2019-09-04 | Extracting method, device, electronic equipment and the storage medium of phrase |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110532567A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111126052A (en) * | 2019-12-26 | 2020-05-08 | 中科鼎富(北京)科技发展有限公司 | Function point generation method and device, electronic equipment and computer readable storage medium |
CN111680492A (en) * | 2020-06-10 | 2020-09-18 | 创新奇智(青岛)科技有限公司 | New word mining method and device and electronic equipment |
CN111783450A (en) * | 2020-06-29 | 2020-10-16 | 中国平安人寿保险股份有限公司 | Phrase extraction method and device in corpus text, storage medium and electronic equipment |
CN112016298A (en) * | 2020-08-28 | 2020-12-01 | 中移(杭州)信息技术有限公司 | Method for extracting product characteristic information, electronic device and storage medium |
CN112183089A (en) * | 2020-09-25 | 2021-01-05 | 中国建设银行股份有限公司 | Corpus analysis method and device, electronic equipment and storage medium |
CN113177410A (en) * | 2021-05-07 | 2021-07-27 | 多点(深圳)数字科技有限公司 | Text word segmentation method and device, storage medium and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104050156A (en) * | 2013-03-15 | 2014-09-17 | 富士通株式会社 | Device, method and electronic equipment for extracting maximum noun phrase |
CN106777275A (en) * | 2016-12-29 | 2017-05-31 | 北京理工大学 | Entity attribute and property value extracting method based on many granularity semantic chunks |
CN107463548A (en) * | 2016-06-02 | 2017-12-12 | 阿里巴巴集团控股有限公司 | Short phrase picking method and device |
CN107463554A (en) * | 2016-06-02 | 2017-12-12 | 阿里巴巴集团控股有限公司 | Short phrase picking method and device |
CN108846037A (en) * | 2018-05-29 | 2018-11-20 | 天津字节跳动科技有限公司 | The method and apparatus of prompting search word |
CN109522418A (en) * | 2018-11-08 | 2019-03-26 | 杭州费尔斯通科技有限公司 | A kind of automanual knowledge mapping construction method |
-
2019
- 2019-09-04 CN CN201910831629.3A patent/CN110532567A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104050156A (en) * | 2013-03-15 | 2014-09-17 | 富士通株式会社 | Device, method and electronic equipment for extracting maximum noun phrase |
CN107463548A (en) * | 2016-06-02 | 2017-12-12 | 阿里巴巴集团控股有限公司 | Short phrase picking method and device |
CN107463554A (en) * | 2016-06-02 | 2017-12-12 | 阿里巴巴集团控股有限公司 | Short phrase picking method and device |
CN106777275A (en) * | 2016-12-29 | 2017-05-31 | 北京理工大学 | Entity attribute and property value extracting method based on many granularity semantic chunks |
CN108846037A (en) * | 2018-05-29 | 2018-11-20 | 天津字节跳动科技有限公司 | The method and apparatus of prompting search word |
CN109522418A (en) * | 2018-11-08 | 2019-03-26 | 杭州费尔斯通科技有限公司 | A kind of automanual knowledge mapping construction method |
Non-Patent Citations (1)
Title |
---|
凌勇: "《现代英语语言研究与教学》", 北京:中国商务出版社, pages: 154 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111126052A (en) * | 2019-12-26 | 2020-05-08 | 中科鼎富(北京)科技发展有限公司 | Function point generation method and device, electronic equipment and computer readable storage medium |
CN111126052B (en) * | 2019-12-26 | 2023-11-03 | 鼎富智能科技有限公司 | Function point generation method, device, electronic equipment and computer readable storage medium |
CN111680492A (en) * | 2020-06-10 | 2020-09-18 | 创新奇智(青岛)科技有限公司 | New word mining method and device and electronic equipment |
CN111783450A (en) * | 2020-06-29 | 2020-10-16 | 中国平安人寿保险股份有限公司 | Phrase extraction method and device in corpus text, storage medium and electronic equipment |
CN112016298A (en) * | 2020-08-28 | 2020-12-01 | 中移(杭州)信息技术有限公司 | Method for extracting product characteristic information, electronic device and storage medium |
CN112183089A (en) * | 2020-09-25 | 2021-01-05 | 中国建设银行股份有限公司 | Corpus analysis method and device, electronic equipment and storage medium |
CN113177410A (en) * | 2021-05-07 | 2021-07-27 | 多点(深圳)数字科技有限公司 | Text word segmentation method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110532567A (en) | Extracting method, device, electronic equipment and the storage medium of phrase | |
Pasha et al. | Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. | |
CN110427627A (en) | Task processing method and device based on semantic expressiveness model | |
CN113220836A (en) | Training method and device of sequence labeling model, electronic equipment and storage medium | |
CN114595686B (en) | Knowledge extraction method, and training method and device of knowledge extraction model | |
US20160078865A1 (en) | Information Processing Method And Electronic Device | |
US20210406467A1 (en) | Method and apparatus for generating triple sample, electronic device and computer storage medium | |
CN111310440A (en) | Text error correction method, device and system | |
CN112784589B (en) | Training sample generation method and device and electronic equipment | |
JP2021170394A (en) | Labeling method for role, labeling device for role, electronic apparatus and storage medium | |
CN112269862B (en) | Text role labeling method, device, electronic equipment and storage medium | |
CN114579104A (en) | Data analysis scene generation method, device, equipment and storage medium | |
CN110275938B (en) | Knowledge extraction method and system based on unstructured document | |
CN111858880A (en) | Method and device for obtaining query result, electronic equipment and readable storage medium | |
CN112989789B (en) | Test method and device of text auditing model, computer equipment and storage medium | |
CN111128130B (en) | Voice data processing method and device and electronic device | |
CN112466277A (en) | Rhythm model training method and device, electronic equipment and storage medium | |
CN101937459A (en) | Tibetan character sequencing device and method based on universal syllable structure | |
WO2023016163A1 (en) | Method for training text recognition model, method for recognizing text, and apparatus | |
CN110348013A (en) | Writing householder method, equipment and readable storage medium storing program for executing based on artificial intelligence | |
US20210382918A1 (en) | Method and apparatus for labeling data | |
CN115510247A (en) | Method, device, equipment and storage medium for constructing electric carbon policy knowledge graph | |
CN114443802A (en) | Interface document processing method and device, electronic equipment and storage medium | |
CN112541346A (en) | Abstract generation method and device, electronic equipment and readable storage medium | |
CN110516030A (en) | It is intended to determination method, apparatus, equipment and the computer readable storage medium of word |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |