CN110210030A - The method and device of Sentence analysis - Google Patents
The method and device of Sentence analysis Download PDFInfo
- Publication number
- CN110210030A CN110210030A CN201910467986.6A CN201910467986A CN110210030A CN 110210030 A CN110210030 A CN 110210030A CN 201910467986 A CN201910467986 A CN 201910467986A CN 110210030 A CN110210030 A CN 110210030A
- Authority
- CN
- China
- Prior art keywords
- sentence
- analyzed
- vocabulary
- basic
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The invention discloses a kind of method and devices of Sentence analysis, it is related to language processing techniques field, main purpose is to solve during existing Sentence analysis, interference of the result of semantic analysis vulnerable to non-key word, so as to cause Sentence analysis accuracy susceptible the problem of, improve Sentence analysis result accuracy.The main technical solution of the present invention are as follows: obtain basic vocabulary from sentence to be analyzed;Feature extraction is carried out to the basic vocabulary and sentence to be analyzed respectively, obtains the sentence feature of the word feature for corresponding to each basic vocabulary and sentence to be analyzed;According to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, the non-key vocabulary in the sentence to be analyzed is determined;The non-key vocabulary is rejected from the sentence to be analyzed, to analyze the sentence to be analyzed after the non-key vocabulary of rejecting.The present invention is for during analyzing sentence.
Description
Technical field
The present invention relates to language processing techniques field more particularly to a kind of method and devices of Sentence analysis.
Background technique
With the continuous progress of technology, the communication of between humans and machines gradually comes into people's lives, also right just because of this
Gradually paid attention to by people in the processing and parsing of the language of user, for example, in big data search or the fields such as reply of chatting
Jing Zhong, after user has input sentence, machine needs to carry out semantic analysis to the sentence, and fed back according to corresponding semanteme with
The corresponding search result of the semanteme carries out chat reply.
Currently, during existing Sentence analysis, often it is all based on what the main word of sentence was analyzed, so
And not only include in practical applications, in the sentence that user is inputted keyword, there are also non-key word parts, and are being based on
During the existing progress semantic analysis by keyword, it is highly prone to the interference of the non-key word in this part, to influence language
The accuracy of sentence analysis.
Summary of the invention
In view of the above problems, the invention proposes a kind of method and device of Sentence analysis, main purpose is to solve
During existing Sentence analysis, interference of the result of semantic analysis vulnerable to non-key word, so as to cause the standard of Sentence analysis
The problem of true property susceptible, improves the accuracy of Sentence analysis result.
In order to achieve the above objectives, present invention generally provides following technical solutions:
On the one hand, the present invention provides a kind of method of Sentence analysis, specifically includes:
Basic vocabulary is obtained from sentence to be analyzed;
Feature extraction is carried out to the basic vocabulary and sentence to be analyzed respectively, the word for obtaining corresponding to each basic vocabulary is special
The sentence feature of sign and sentence to be analyzed;
According to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, determine described wait divide
Analyse the non-key vocabulary in sentence;
The non-key vocabulary is rejected from the sentence to be analyzed, so as to to be analyzed after the non-key vocabulary of rejecting
Sentence is analyzed.
Preferably, the basic vocabulary that obtains from sentence to be analyzed includes:
Participle operation is carried out to the sentence to be analyzed, obtains the basic vocabulary.
Preferably, described that feature extraction is carried out to the basic vocabulary and sentence to be analyzed respectively, it obtains corresponding to each base
The word feature of this vocabulary and the sentence feature of sentence to be analyzed include:
Determine the term vector of each basic vocabulary;And
Determine the sentence vector of the sentence to be analyzed.
It is preferably, described according to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed,
Determine the non-key vocabulary in the sentence to be analyzed, comprising:
According to the term vector of each basic vocabulary, the sentence vector with the sentence to be analyzed is calculated respectively,
Determine each basic vocabulary for the significance level of the sentence to be analyzed, and according to the significance level of each basic vocabulary, really
Non-key vocabulary in the fixed sentence to be analyzed.
Preferably, the term vector according to each basic vocabulary, respectively with the sentence of the sentence to be analyzed to
Amount is calculated, and determines each basic vocabulary for the significance level of the sentence to be analyzed, and according to each basic vocabulary
Significance level determines the non-key vocabulary in the sentence to be analyzed, comprising:
The sentence vector of the term vector of each basic vocabulary and the sentence to be analyzed is subjected to dot product operation, is obtained
The weighted value of corresponding each basic vocabulary;
According to the weighted value, determine each basic vocabulary for the significance level of the sentence to be analyzed, the weight
Value is positively correlated with significance level;
According to the significance level, the basic vocabulary by significance level lower than preset threshold is determined as non-key vocabulary.
On the other hand, the present invention provides a kind of device of Sentence analysis, specifically includes:
Acquiring unit, for obtaining basic vocabulary from sentence to be analyzed;
Processing unit obtains corresponding each for carrying out feature extraction to the basic vocabulary and sentence to be analyzed respectively
The word feature of basic vocabulary and the sentence feature of sentence to be analyzed;
Determination unit, for special according to the word feature of each basic vocabulary and the sentence of the sentence to be analyzed
Sign, determines the non-key vocabulary in the sentence to be analyzed;
Analytical unit, for rejecting the non-key vocabulary from the sentence to be analyzed, so as to non-key to rejecting
Sentence to be analyzed after vocabulary is analyzed.
Preferably, the acquiring unit includes:
Processing module obtains the basic vocabulary for carrying out participle operation to the sentence to be analyzed.
Preferably, the processing unit includes:
First determining module, for determining the term vector of each basic vocabulary;
Second determining module, for determining the sentence vector of the sentence to be analyzed.
Preferably, the determination unit, specifically for the term vector according to each basic vocabulary, respectively with it is described to
The sentence vector of anolytic sentence is calculated, determine each basic vocabulary for the significance level of the sentence to be analyzed, and root
According to the significance level of each basic vocabulary, the non-key vocabulary in the sentence to be analyzed is determined.
Preferably, the determination unit includes:
Computing module, for carrying out the sentence vector of the term vector of each basic vocabulary and the sentence to be analyzed
Dot product operation obtains the weighted value of corresponding each basic vocabulary;
First determining module, for determining each basic vocabulary for the sentence to be analyzed according to the weighted value
Significance level, the weighted value are positively correlated with significance level;
Second determining module, for according to the significance level, the basic vocabulary that significance level is lower than preset threshold to be true
It is set to non-key vocabulary.
On the other hand, the present invention provides a kind of computer readable storage medium, wherein the computer readable storage medium
On be stored with computer program, wherein the computer program realizes above-mentioned language when being executed by one or more computing devices
The method of sentence analysis.
On the other hand, the present invention provides a kind of is including one or more computing devices and one or more storage devices
It unites, record has computer program on one or more of storage devices, and the computer program is one or more of
The method that computing device makes one or more of computing devices realize above-mentioned Sentence analysis when executing.
By above-mentioned technical proposal, a kind of method and device of Sentence analysis provided by the invention can be improved sentence point
Analyse the accuracy of result.It is relatively existing that based on during Sentence analysis, when the semantic analysis of sentence, there are non-key vocabulary
The problem of interference, the present invention can obtain basic vocabulary from sentence to be analyzed, then respectively to the basic vocabulary and wait divide
It analyses sentence and carries out feature extraction, obtain the sentence feature of the word feature for corresponding to each basic vocabulary and sentence to be analyzed, later
According to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, determine in the sentence to be analyzed
Non-key vocabulary, finally the non-key vocabulary is rejected from the sentence to be analyzed, so as to rejecting non-key vocabulary
Sentence to be analyzed afterwards is analyzed, to realize analysis to sentence, compared to existing analysis mode, the present invention by pair
Lexical feature and sentence feature determine the non-key vocabulary in sentence to be analyzed, thus when carrying out Sentence analysis by non-key word
Remittance is rejected, and when then ensuring subsequent analysis, is eliminated interference when analysis of the non-key vocabulary to sentence, is improved language
Sentence precision of analysis.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of the method for Sentence analysis of proposition of the embodiment of the present invention;
Fig. 2 shows the flow charts of the method for another Sentence analysis of proposition of the embodiment of the present invention;
Fig. 3 shows a kind of composition block diagram of the device of Sentence analysis of proposition of the embodiment of the present invention;
Fig. 4 shows the composition block diagram of the device of another Sentence analysis of proposition of the embodiment of the present invention.
Specific embodiment
The exemplary embodiment that the present invention will be described in more detail below with reference to accompanying drawings.Although showing the present invention in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the present invention without should be by embodiments set forth here
It is limited.It is to be able to thoroughly understand the present invention on the contrary, providing these embodiments, and can be by the scope of the present invention
It is fully disclosed to those skilled in the art.
The embodiment of the invention provides a kind of method of Sentence analysis, this method is used in the process analyzed sentence
In, interference of the non-key word to Sentence analysis when is excluded, is solved in the analytic process of existing sentence because non-key stem is disturbed
The lower problem of the analysis result accuracy of sentence.This method specific steps are as shown in Figure 1, comprising:
101, basic vocabulary is obtained from sentence to be analyzed.
It based on existing Sentence analysis process is carried out by treating each vocabulary in anolytic sentence, therefore, at this
In inventive embodiments, when sentence to be analyzed has been determined, then need from the basic vocabulary obtained in sentence to be analyzed in the sentence,
The basic vocabulary can be understood as the most basic vocabulary of each in the sentence obtained after being split in sentence.
For example, then the method according to this step is waited for from this when sentence to be analyzed is " I thinks that this part clothes is pretty good "
It may include: " I feels " " this part ", " clothes ", " good " that the basic vocabulary obtained after basic vocabulary is obtained in anolytic sentence.
It should be noted that the mode when obtaining basic vocabulary from sentence to be analyzed can lead in this step
It crosses existing any mode to carry out, such as participle operation can be carried out by treating anolytic sentence and obtain the basic word
It converges.It is, of course, also possible to be carried out using other any modes, method described in this step is exemplary only, does not do herein
It is specific to limit.
102, feature extraction is carried out to the basic vocabulary and sentence to be analyzed respectively, obtains corresponding to each basic vocabulary
The sentence feature of word feature and sentence to be analyzed.
Due in daily interactive process, between the meaning of each vocabulary in sentence and the whole meaning of sentence not
It is identical, therefore, during Sentence analysis, need to the word feature of basic vocabulary obtained in above-mentioned steps and the language of sentence
Sentence feature extracts.Wherein, word feature can be understood as capable of characterizing the feature of basic vocabulary meaning or mood tendency, and language
Sentence feature then can be understood as to characterize the meaning of sentence entirety or the feature of mood tendency, here, for word feature and language
The type of sentence feature does not do specific restriction, can choose word feature, the specific type of sentence feature and right according to actual needs
The extracting mode answered.But it should be recognized that in order to ensure the realization of the comparison between subsequent word feature and sentence feature, at this
It is necessary to ensure that between extracted word feature and the sentence feature of extraction to be mutually of a sort feature in inventive embodiments.For example,
When extracted word feature is the term vector of basic vocabulary, then the sentence feature extracted is the sentence vector of corresponding term vector.
103, according to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, determine described in
Non-key vocabulary in sentence to be analyzed.
Based on the sentence feature for the word feature and sentence for having obtained basic vocabulary in abovementioned steps 102, word-based feature and
Sentence is characterized in capable of being used to characterize the feature of vocabulary and sentence, thus this step summarize can by the comparison between feature,
The word feature of each basic vocabulary in sentence sentence feature corresponding with entire sentence is carried out in this regard, to based between feature
Difference degree difference, it is similarity between word feature therein and sentence feature is smaller, or the biggish basic vocabulary of difference
The non-key vocabulary being determined as in this sentence.For example, when selected word feature is term vector, sentence feature is sentence vector
When, then it can be compared according to the similarity between term vector and sentence vector, to similarity between the two is less than certain
Threshold value, it is determined as non-key vocabulary.
104, the non-key vocabulary is rejected from the sentence to be analyzed, so as to reject after non-key vocabulary to
Anolytic sentence is analyzed.
After the non-key vocabulary in sentence to be analyzed has been determined, then illustrate that these non-key vocabulary are to be analyzed for analyzing
Be unnecessary consideration when sentence, therefore, influence when Sentence analysis treated in order to avoid these non-key vocabulary, then it can be with
Above-mentioned non-key vocabulary is rejected from sentence to be analyzed in this step, after then obtaining eliminating non-key word to
Anolytic sentence, and with the analysis of this sentence progress sentence, so as to improve accuracy when entire Sentence analysis.
Method as Sentence analysis described in Fig. 1 is further extended and is extended, and the embodiment of the present invention also provides
The method of another Sentence analysis, process is as shown in Fig. 2, specific steps include:
201, basic vocabulary is obtained from sentence to be analyzed.
Specifically, in embodiments of the present invention, when obtaining basic vocabulary from sentence to be analyzed, specifically executing
It can be carried out by participle operation in journey, therefore, the mode that this step specifically obtains basic vocabulary can be with are as follows: to described to be analyzed
Sentence carries out participle operation, obtains the basic vocabulary.
202, feature extraction is carried out to the basic vocabulary and sentence to be analyzed respectively, obtains corresponding to each basic vocabulary
The sentence feature of word feature and sentence to be analyzed.
Wherein, during practical operation, accuracy when in order to ensure subsequent analysis mentions in embodiments of the present invention
The concrete mode of word feature and sentence feature is taken to can be by carrying out basic vocabulary and sentence to be analyzed based on vectorization
Operation carries out feature extraction, therefore, mode when this step executes may include: firstly, determine the word of each basic vocabulary to
Amount.Then the sentence vector of the sentence to be analyzed is determined.Certainly, in embodiments of the present invention, for term vector and sentence to
There is no sequencing between the determination process of amount, the two can also carry out simultaneously.
203, according to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, determine described in
Non-key vocabulary in sentence to be analyzed.
Specifically, this step can be with are as follows: according to the term vector of each basic vocabulary, respectively with the sentence to be analyzed
Sentence vector calculated, determine each basic vocabulary for the significance level of the sentence to be analyzed, and according to each base
The significance level of this vocabulary determines the non-key vocabulary in the sentence to be analyzed.Wherein, the significance level can be understood as
It is recorded a demerit based on calculating between term vector and sentence vector determining.
Further, in the significance level according to each basic vocabulary, the non-key word in the sentence to be analyzed is determined
When remittance, executive mode can carry out in the following manner:
Firstly, the sentence vector of the term vector of each basic vocabulary and the sentence to be analyzed is carried out dot product behaviour
Make, obtains the weighted value of corresponding each basic vocabulary.Wherein, dot product, in mathematics, also known as scalar product (dot product;
Scalar product, also referred to as dot product) it is the binary for receiving two vectors on real number R and returning to a real number value scalar
Operation.It is the standard inner product of Euclidean space.The meaning of the operation can determine the similar of two vectors of dot product operation
Property.Therefore, by will carry out dot product operation between term vector and sentence vector, matter can determine the phase between two vectors in fact
Like degree, therefore, weighted value obtained in this step is the similarity degree characterized between each term vector and sentence vector.Tool
Body, during carrying out dot product calculating, executive mode can be as follows:
For example, when there are when two vectors, vector A=[a1, a2 ..., an] and vector B=[b1, b2 ..., bn], then this
Dot product is carried out between two vectors are as follows:
AB=a1b1+a2b2+ ...+anbn
In above-mentioned dot product formula, calculated result is equivalent to the cosine value of the angle between two vectors, i.e. cosine value
Domain be located at [- 1,1], for mathematical angle, cosine value is bigger, illustrates that projection of the A vector on B vector is bigger, then says
It is more similar between the two bright vectors, conversely, then illustrating that the representative of the two vectors is contrary, no when cosine value is -1
It is approximate.
Then, according to the weighted value, determine each basic vocabulary for the significance level of the sentence to be analyzed, it is described
Weighted value is positively correlated with significance level.The similarity degree between term vector and sentence vector can be characterized based on weighted value, because
This, the height of similarity degree can intuitively embody basic vocabulary corresponding to the term vector for the weight of the sentence to be analyzed
Want degree.
Finally, the basic vocabulary by significance level lower than preset threshold is determined as non-key word according to the significance level
It converges.In such manner, it is possible to obtaining the weighted value to characterize similarity between vectors in a manner of dot product operation between vector, and be based on
Weighted value determines that each basic vocabulary determines non-key vocabulary therein for the significance level of sentence to be analyzed, and with this,
The non-key part in sentence, i.e., non-key vocabulary, so that it is guaranteed that non-key word can be determined in a manner of more intuitive and accurate
The accuracy of the determination of remittance has then established base to exclude interference when non-key vocabulary treats anolytic sentence analysis on the whole
Plinth.
204, the non-key vocabulary is rejected from the sentence to be analyzed, so as to reject after non-key vocabulary to
Anolytic sentence is analyzed.
Based on the non-key vocabulary determined in sentence to be analyzed in abovementioned steps 203, also, these non-key vocabulary exist
Analysis result can be impacted during Sentence analysis, it therefore, then can be non-by these after non-key vocabulary has been determined
Key vocabularies are rejected, so that reducing non-when to the Sentence analysis to be analyzed for eliminating above-mentioned non-key vocabulary
Key vocabularies analyze it interference of result, improve accuracy when analysis.
For example, can then carry out participle operation to it when sentence to be analyzed is " I feels pretty good " and obtain basic vocabulary
" I feels ", " pretty good ".Then the term vector for extracting basic vocabulary respectively obtains the term vector a of " I feels ", the word of " pretty good "
Vector b and sentence vector A.Then dot product between term vector a and sentence vector A is operated, obtains weighted value 0.1, simultaneously will
Dot product operates between term vector b and sentence vector A, obtains weighted value 0.7.And according to the size of weighted value, determine that vector a is corresponding
Basic vocabulary " I feels " significance level be it is low, determine the significance level of the corresponding basic vocabulary of vector b " pretty good " for height.
Further according to the height of significance level, basic vocabulary " I feels " is determined as non-key vocabulary, and " I feels by the non-key vocabulary
" rejected from sentence to be analyzed " I feels pretty good ", the sentence to be analyzed after being rejected is " pretty good ".And it is based on eliminating
" pretty good " the progress Sentence analysis of sentence to be analyzed after non-key vocabulary
In another example sentence to be analyzed 1 " I want to eat apple, orange, there are also other various fruit " in time to point
It analyses in sentence 2 " I wants to eat apple ", then after carrying out participle operation, available basic vocabulary in sentence complete or collected works sentence 2
" apple ", but in the method described according to embodiments of the present invention determining to weighted value is carried out based on sentence vector and term vector
When, the weight of " apple " in sentence 1 is only 0.2, and the weight of " apple " in sentence 2 is 0.6, therefore in sentence to be analyzed
It can determine that basicvocabulary " apple " is non-key vocabulary in 1, therefore be by " apple when treating anolytic sentence 1 and being analyzed
Fruit " carries out rejecting post analysis, and " apple " weight with higher in sentence 2 to be analyzed, and can determine that it is not is non-pass
Keyword converges, therefore when treating anolytic sentence 2 and being analyzed, then need to analyze include " apple " sentence.
Further, the realization as the method to above-mentioned Sentence analysis, the embodiment of the invention provides a kind of sentences point
The device of analysis, the device are mainly used for solving during existing Sentence analysis, and the result of semantic analysis is vulnerable to non-key
The interference of word, so as to cause Sentence analysis accuracy susceptible the problem of, improve Sentence analysis result accuracy.For just
In reading, present apparatus embodiment no longer repeats the detail content in preceding method embodiment one by one, it should be understood that this
Device in embodiment can correspond to the full content realized in preceding method embodiment.The device is as shown in figure 3, specific packet
It includes:
Acquiring unit 31 can be used for obtaining basic vocabulary from sentence to be analyzed;
Processing unit 32, the basic vocabulary and sentence to be analyzed that can be used for respectively obtaining the acquiring unit 31 carry out
Feature extraction obtains the sentence feature of the word feature for corresponding to each basic vocabulary and sentence to be analyzed;
Determination unit 33, the word feature for each basic vocabulary that can be used for being obtained according to the processing unit 32 and institute
The sentence feature for stating sentence to be analyzed determines the non-key vocabulary in the sentence to be analyzed;
Analytical unit 34, the non-key vocabulary that can be used for determining the determination unit 33 is from the sentence to be analyzed
It rejects, to analyze the sentence to be analyzed after the non-key vocabulary of rejecting.
Further, as shown in figure 4, the acquiring unit 31 includes:
Processing module 311 can be used for carrying out participle operation to the sentence to be analyzed, obtain the basic vocabulary.
Further, as shown in figure 4, the processing unit 32 includes:
First determining module 321, is determined for the term vector of each basic vocabulary;
Second determining module 322 is determined for the sentence vector of the sentence to be analyzed.
Further, as shown in figure 4, the determination unit 33, can be specifically used for according to each basic vocabulary
Term vector, the sentence vector with the sentence to be analyzed is calculated respectively, determines each basic vocabulary for described to be analyzed
The significance level of sentence, and according to the significance level of each basic vocabulary, determine the non-key vocabulary in the sentence to be analyzed.
Further, as shown in figure 4, the determination unit 33 includes:
Computing module 331 can be used for the sentence of the term vector of each basic vocabulary and the sentence to be analyzed
Vector carries out dot product operation, obtains the weighted value of corresponding each basic vocabulary;
First determining module 332 can be used for the weighted value calculated according to the computing module 331, determine each basic
For the significance level of the sentence to be analyzed, the weighted value is positively correlated vocabulary with significance level;
Second determining module 333 can be used for the significance level determined according to first determining module 332, will be important
Degree is determined as non-key vocabulary lower than the basic vocabulary of preset threshold.
Further, the embodiment of the invention also provides a kind of computer readable storage mediums, wherein the computer can
It reads to be stored with computer program on storage medium, wherein real when the computer program is executed by one or more computing devices
The method of existing above-mentioned Sentence analysis.
In addition, including one or more computing devices and one or more storage dresses the embodiment of the invention also provides one kind
The system set, record has computer program on one or more of storage devices, and the computer program is one
Or the method that multiple computing devices make one or more of computing devices realize above-mentioned Sentence analysis when executing.
In conclusion a kind of method and device for Sentence analysis that the embodiment of the present invention proposes, can be improved Sentence analysis
As a result accuracy.It is relatively existing that based on during Sentence analysis, when the semantic analysis of sentence, there are the dry of non-key vocabulary
The problem of disturbing, the present invention can obtain basic vocabulary from sentence to be analyzed, then respectively to the basic vocabulary and to be analyzed
Sentence carries out feature extraction, obtains the sentence feature of the word feature for corresponding to each basic vocabulary and sentence to be analyzed, Zhi Hougen
According to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, determine in the sentence to be analyzed
Non-key vocabulary finally rejects the non-key vocabulary from the sentence to be analyzed, after to non-key vocabulary is rejected
Sentence to be analyzed analyzed, to realize analysis to sentence, compared to existing analysis mode, the present invention passes through to word
Remittance feature and sentence feature determine the non-key vocabulary in sentence to be analyzed, thus when carrying out Sentence analysis by non-key vocabulary
It is rejected, when then ensuring subsequent analysis, eliminates interference when analysis of the non-key vocabulary to sentence, improve sentence
Precision of analysis.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
It is understood that the correlated characteristic in the above method and device can be referred to mutually.In addition, in above-described embodiment
" first ", " second " etc. be and not represent the superiority and inferiority of each embodiment for distinguishing each embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein.
Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system
Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various
Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In addition, memory may include the non-volatile memory in computer-readable medium, random access memory
(RAM) and/or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM), memory includes extremely
A few storage chip.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flashRAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element
There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product.
Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application
Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code
The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art,
Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement,
Improve etc., it should be included within the scope of the claims of this application.
Claims (10)
1. a kind of method of Sentence analysis, wherein the described method includes:
Basic vocabulary is obtained from sentence to be analyzed;
Feature extraction is carried out to the basic vocabulary and sentence to be analyzed respectively, obtain corresponding to the word feature of each basic vocabulary with
And the sentence feature of sentence to be analyzed;
According to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, the language to be analyzed is determined
Non-key vocabulary in sentence;
The non-key vocabulary is rejected from the sentence to be analyzed, so as to the sentence to be analyzed after the non-key vocabulary of rejecting
It is analyzed.
2. the method for claim 1, wherein the basic vocabulary that obtains from sentence to be analyzed includes:
Participle operation is carried out to the sentence to be analyzed, obtains the basic vocabulary.
3. method according to claim 2, wherein described to be mentioned respectively to the basic vocabulary and sentence to be analyzed progress feature
It takes, the sentence feature for obtaining the word feature for corresponding to each basic vocabulary and sentence to be analyzed includes:
Determine the term vector of each basic vocabulary;And
Determine the sentence vector of the sentence to be analyzed.
4. method as claimed in claim 3, wherein the word feature according to each basic vocabulary and it is described to point
The sentence feature for analysing sentence, determines the non-key vocabulary in the sentence to be analyzed, comprising:
According to the term vector of each basic vocabulary, the sentence vector with the sentence to be analyzed is calculated respectively, is determined
Each basic vocabulary determines institute and according to the significance level of each basic vocabulary for the significance level of the sentence to be analyzed
State the non-key vocabulary in sentence to be analyzed.
5. method as claimed in claim 4, wherein the term vector according to each basic vocabulary, respectively with it is described
The sentence vector of sentence to be analyzed is calculated, determine each basic vocabulary for the significance level of the sentence to be analyzed, and
According to the significance level of each basic vocabulary, the non-key vocabulary in the sentence to be analyzed is determined, comprising:
The sentence vector of the term vector of each basic vocabulary and the sentence to be analyzed is subjected to dot product operation, obtains correspondence
The weighted value of each basic vocabulary;
According to the weighted value, determine each basic vocabulary for the significance level of the sentence to be analyzed, the weighted value with
Significance level is positively correlated;
According to the significance level, the basic vocabulary by significance level lower than preset threshold is determined as non-key vocabulary.
6. a kind of device of Sentence analysis, wherein described device includes:
Acquiring unit, for obtaining basic vocabulary from sentence to be analyzed;
Processing unit obtains corresponding to each basic for carrying out feature extraction to the basic vocabulary and sentence to be analyzed respectively
The word feature of vocabulary and the sentence feature of sentence to be analyzed;
Determination unit, for according to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, really
Non-key vocabulary in the fixed sentence to be analyzed;
Analytical unit, for rejecting the non-key vocabulary from the sentence to be analyzed, so as to the non-key vocabulary of rejecting
Sentence to be analyzed afterwards is analyzed.
7. device as claimed in claim 6, wherein the acquiring unit includes:
Processing module obtains the basic vocabulary for carrying out participle operation to the sentence to be analyzed.
8. device as claimed in claim 7, wherein the processing unit includes:
First determining module, for determining the term vector of each basic vocabulary;
Second determining module, for determining the sentence vector of the sentence to be analyzed.
9. a kind of computer readable storage medium, store computer program, the computer program when being executed by processor,
It realizes and includes the steps that in the method for Sentence analysis according to any one of claim 1 to 5.
10. a kind of computer equipment, the computer equipment include:
One or more processors;And
Memory, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors
It realizes and includes the steps that in the method for Sentence analysis according to any one of claim 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910467986.6A CN110210030B (en) | 2019-05-31 | 2019-05-31 | Statement analysis method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910467986.6A CN110210030B (en) | 2019-05-31 | 2019-05-31 | Statement analysis method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110210030A true CN110210030A (en) | 2019-09-06 |
CN110210030B CN110210030B (en) | 2021-02-09 |
Family
ID=67789868
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910467986.6A Active CN110210030B (en) | 2019-05-31 | 2019-05-31 | Statement analysis method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110210030B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906403A (en) * | 2021-04-25 | 2021-06-04 | 中国平安人寿保险股份有限公司 | Semantic analysis model training method and device, terminal equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018032213A (en) * | 2016-08-24 | 2018-03-01 | シャープ株式会社 | Information processor, information processing system, information processing method and program |
CN109213856A (en) * | 2018-10-22 | 2019-01-15 | 广东小天才科技有限公司 | A kind of method for recognizing semantics and system |
CN109492225A (en) * | 2018-11-08 | 2019-03-19 | 大连瀚闻资讯有限公司 | A kind of public feelings information text handling method of rare foreign languages country |
CN109522544A (en) * | 2018-09-27 | 2019-03-26 | 厦门快商通信息技术有限公司 | Sentence vector calculation, file classification method and system based on Chi-square Test |
CN109815492A (en) * | 2019-01-04 | 2019-05-28 | 平安科技(深圳)有限公司 | A kind of intension recognizing method based on identification model, identification equipment and medium |
-
2019
- 2019-05-31 CN CN201910467986.6A patent/CN110210030B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018032213A (en) * | 2016-08-24 | 2018-03-01 | シャープ株式会社 | Information processor, information processing system, information processing method and program |
CN109522544A (en) * | 2018-09-27 | 2019-03-26 | 厦门快商通信息技术有限公司 | Sentence vector calculation, file classification method and system based on Chi-square Test |
CN109213856A (en) * | 2018-10-22 | 2019-01-15 | 广东小天才科技有限公司 | A kind of method for recognizing semantics and system |
CN109492225A (en) * | 2018-11-08 | 2019-03-19 | 大连瀚闻资讯有限公司 | A kind of public feelings information text handling method of rare foreign languages country |
CN109815492A (en) * | 2019-01-04 | 2019-05-28 | 平安科技(深圳)有限公司 | A kind of intension recognizing method based on identification model, identification equipment and medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906403A (en) * | 2021-04-25 | 2021-06-04 | 中国平安人寿保险股份有限公司 | Semantic analysis model training method and device, terminal equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110210030B (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Šilić et al. | Visualization of text streams: A survey | |
CN112395506A (en) | Information recommendation method and device, electronic equipment and storage medium | |
CN109582948B (en) | Method and device for extracting evaluation viewpoints | |
CN110019668A (en) | A kind of text searching method and device | |
CN104978356B (en) | A kind of recognition methods of synonym and device | |
CN110162780A (en) | The recognition methods and device that user is intended to | |
CN106610931B (en) | Topic name extraction method and device | |
CN109582954A (en) | Method and apparatus for output information | |
CN112527970B (en) | Data dictionary standardization processing method, device, equipment and storage medium | |
WO2012158572A2 (en) | Exploiting query click logs for domain detection in spoken language understanding | |
CN110390095A (en) | Sentence mask method and sentence annotation equipment | |
CN106598949A (en) | Method and device for confirming contribution degree of words to text | |
CN112329460A (en) | Text topic clustering method, device, equipment and storage medium | |
CN106598997B (en) | Method and device for calculating text theme attribution degree | |
US10482162B2 (en) | Automatic equation transformation from text | |
Tappler et al. | Active model learning of stochastic reactive systems | |
Schröder et al. | Small-text: Active learning for text classification in python | |
CN110210030A (en) | The method and device of Sentence analysis | |
CN110019670A (en) | A kind of text searching method and device | |
CN107832271B (en) | Function image drawing method, device, equipment and computer storage medium | |
CN108875743A (en) | A kind of text recognition method and device | |
CN108460038A (en) | Rule matching method and its equipment | |
Schuster et al. | Alignment approximation for process trees | |
CN111797995B (en) | Method and device for generating interpretation report of model prediction sample | |
CN109684473A (en) | A kind of automatic bulletin generation method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200728 Address after: 518000 Nanshan District science and technology zone, Guangdong, Zhejiang Province, science and technology in the Tencent Building on the 1st floor of the 35 layer Applicant after: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd. Address before: 100029, Beijing, Chaoyang District new East Street, building No. 2, -3 to 25, 101, 8, 804 rooms Applicant before: Tricorn (Beijing) Technology Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |