CN110210030A - The method and device of Sentence analysis - Google Patents

The method and device of Sentence analysis Download PDF

Info

Publication number
CN110210030A
CN110210030A CN201910467986.6A CN201910467986A CN110210030A CN 110210030 A CN110210030 A CN 110210030A CN 201910467986 A CN201910467986 A CN 201910467986A CN 110210030 A CN110210030 A CN 110210030A
Authority
CN
China
Prior art keywords
sentence
analyzed
vocabulary
basic
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910467986.6A
Other languages
Chinese (zh)
Other versions
CN110210030B (en
Inventor
王卓然
亓超
马宇驰
侯兴林
李彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Triangle Animal (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Triangle Animal (beijing) Technology Co Ltd filed Critical Triangle Animal (beijing) Technology Co Ltd
Priority to CN201910467986.6A priority Critical patent/CN110210030B/en
Publication of CN110210030A publication Critical patent/CN110210030A/en
Application granted granted Critical
Publication of CN110210030B publication Critical patent/CN110210030B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a kind of method and devices of Sentence analysis, it is related to language processing techniques field, main purpose is to solve during existing Sentence analysis, interference of the result of semantic analysis vulnerable to non-key word, so as to cause Sentence analysis accuracy susceptible the problem of, improve Sentence analysis result accuracy.The main technical solution of the present invention are as follows: obtain basic vocabulary from sentence to be analyzed;Feature extraction is carried out to the basic vocabulary and sentence to be analyzed respectively, obtains the sentence feature of the word feature for corresponding to each basic vocabulary and sentence to be analyzed;According to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, the non-key vocabulary in the sentence to be analyzed is determined;The non-key vocabulary is rejected from the sentence to be analyzed, to analyze the sentence to be analyzed after the non-key vocabulary of rejecting.The present invention is for during analyzing sentence.

Description

The method and device of Sentence analysis
Technical field
The present invention relates to language processing techniques field more particularly to a kind of method and devices of Sentence analysis.
Background technique
With the continuous progress of technology, the communication of between humans and machines gradually comes into people's lives, also right just because of this Gradually paid attention to by people in the processing and parsing of the language of user, for example, in big data search or the fields such as reply of chatting Jing Zhong, after user has input sentence, machine needs to carry out semantic analysis to the sentence, and fed back according to corresponding semanteme with The corresponding search result of the semanteme carries out chat reply.
Currently, during existing Sentence analysis, often it is all based on what the main word of sentence was analyzed, so And not only include in practical applications, in the sentence that user is inputted keyword, there are also non-key word parts, and are being based on During the existing progress semantic analysis by keyword, it is highly prone to the interference of the non-key word in this part, to influence language The accuracy of sentence analysis.
Summary of the invention
In view of the above problems, the invention proposes a kind of method and device of Sentence analysis, main purpose is to solve During existing Sentence analysis, interference of the result of semantic analysis vulnerable to non-key word, so as to cause the standard of Sentence analysis The problem of true property susceptible, improves the accuracy of Sentence analysis result.
In order to achieve the above objectives, present invention generally provides following technical solutions:
On the one hand, the present invention provides a kind of method of Sentence analysis, specifically includes:
Basic vocabulary is obtained from sentence to be analyzed;
Feature extraction is carried out to the basic vocabulary and sentence to be analyzed respectively, the word for obtaining corresponding to each basic vocabulary is special The sentence feature of sign and sentence to be analyzed;
According to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, determine described wait divide Analyse the non-key vocabulary in sentence;
The non-key vocabulary is rejected from the sentence to be analyzed, so as to to be analyzed after the non-key vocabulary of rejecting Sentence is analyzed.
Preferably, the basic vocabulary that obtains from sentence to be analyzed includes:
Participle operation is carried out to the sentence to be analyzed, obtains the basic vocabulary.
Preferably, described that feature extraction is carried out to the basic vocabulary and sentence to be analyzed respectively, it obtains corresponding to each base The word feature of this vocabulary and the sentence feature of sentence to be analyzed include:
Determine the term vector of each basic vocabulary;And
Determine the sentence vector of the sentence to be analyzed.
It is preferably, described according to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, Determine the non-key vocabulary in the sentence to be analyzed, comprising:
According to the term vector of each basic vocabulary, the sentence vector with the sentence to be analyzed is calculated respectively, Determine each basic vocabulary for the significance level of the sentence to be analyzed, and according to the significance level of each basic vocabulary, really Non-key vocabulary in the fixed sentence to be analyzed.
Preferably, the term vector according to each basic vocabulary, respectively with the sentence of the sentence to be analyzed to Amount is calculated, and determines each basic vocabulary for the significance level of the sentence to be analyzed, and according to each basic vocabulary Significance level determines the non-key vocabulary in the sentence to be analyzed, comprising:
The sentence vector of the term vector of each basic vocabulary and the sentence to be analyzed is subjected to dot product operation, is obtained The weighted value of corresponding each basic vocabulary;
According to the weighted value, determine each basic vocabulary for the significance level of the sentence to be analyzed, the weight Value is positively correlated with significance level;
According to the significance level, the basic vocabulary by significance level lower than preset threshold is determined as non-key vocabulary.
On the other hand, the present invention provides a kind of device of Sentence analysis, specifically includes:
Acquiring unit, for obtaining basic vocabulary from sentence to be analyzed;
Processing unit obtains corresponding each for carrying out feature extraction to the basic vocabulary and sentence to be analyzed respectively The word feature of basic vocabulary and the sentence feature of sentence to be analyzed;
Determination unit, for special according to the word feature of each basic vocabulary and the sentence of the sentence to be analyzed Sign, determines the non-key vocabulary in the sentence to be analyzed;
Analytical unit, for rejecting the non-key vocabulary from the sentence to be analyzed, so as to non-key to rejecting Sentence to be analyzed after vocabulary is analyzed.
Preferably, the acquiring unit includes:
Processing module obtains the basic vocabulary for carrying out participle operation to the sentence to be analyzed.
Preferably, the processing unit includes:
First determining module, for determining the term vector of each basic vocabulary;
Second determining module, for determining the sentence vector of the sentence to be analyzed.
Preferably, the determination unit, specifically for the term vector according to each basic vocabulary, respectively with it is described to The sentence vector of anolytic sentence is calculated, determine each basic vocabulary for the significance level of the sentence to be analyzed, and root According to the significance level of each basic vocabulary, the non-key vocabulary in the sentence to be analyzed is determined.
Preferably, the determination unit includes:
Computing module, for carrying out the sentence vector of the term vector of each basic vocabulary and the sentence to be analyzed Dot product operation obtains the weighted value of corresponding each basic vocabulary;
First determining module, for determining each basic vocabulary for the sentence to be analyzed according to the weighted value Significance level, the weighted value are positively correlated with significance level;
Second determining module, for according to the significance level, the basic vocabulary that significance level is lower than preset threshold to be true It is set to non-key vocabulary.
On the other hand, the present invention provides a kind of computer readable storage medium, wherein the computer readable storage medium On be stored with computer program, wherein the computer program realizes above-mentioned language when being executed by one or more computing devices The method of sentence analysis.
On the other hand, the present invention provides a kind of is including one or more computing devices and one or more storage devices It unites, record has computer program on one or more of storage devices, and the computer program is one or more of The method that computing device makes one or more of computing devices realize above-mentioned Sentence analysis when executing.
By above-mentioned technical proposal, a kind of method and device of Sentence analysis provided by the invention can be improved sentence point Analyse the accuracy of result.It is relatively existing that based on during Sentence analysis, when the semantic analysis of sentence, there are non-key vocabulary The problem of interference, the present invention can obtain basic vocabulary from sentence to be analyzed, then respectively to the basic vocabulary and wait divide It analyses sentence and carries out feature extraction, obtain the sentence feature of the word feature for corresponding to each basic vocabulary and sentence to be analyzed, later According to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, determine in the sentence to be analyzed Non-key vocabulary, finally the non-key vocabulary is rejected from the sentence to be analyzed, so as to rejecting non-key vocabulary Sentence to be analyzed afterwards is analyzed, to realize analysis to sentence, compared to existing analysis mode, the present invention by pair Lexical feature and sentence feature determine the non-key vocabulary in sentence to be analyzed, thus when carrying out Sentence analysis by non-key word Remittance is rejected, and when then ensuring subsequent analysis, is eliminated interference when analysis of the non-key vocabulary to sentence, is improved language Sentence precision of analysis.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of the method for Sentence analysis of proposition of the embodiment of the present invention;
Fig. 2 shows the flow charts of the method for another Sentence analysis of proposition of the embodiment of the present invention;
Fig. 3 shows a kind of composition block diagram of the device of Sentence analysis of proposition of the embodiment of the present invention;
Fig. 4 shows the composition block diagram of the device of another Sentence analysis of proposition of the embodiment of the present invention.
Specific embodiment
The exemplary embodiment that the present invention will be described in more detail below with reference to accompanying drawings.Although showing the present invention in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the present invention without should be by embodiments set forth here It is limited.It is to be able to thoroughly understand the present invention on the contrary, providing these embodiments, and can be by the scope of the present invention It is fully disclosed to those skilled in the art.
The embodiment of the invention provides a kind of method of Sentence analysis, this method is used in the process analyzed sentence In, interference of the non-key word to Sentence analysis when is excluded, is solved in the analytic process of existing sentence because non-key stem is disturbed The lower problem of the analysis result accuracy of sentence.This method specific steps are as shown in Figure 1, comprising:
101, basic vocabulary is obtained from sentence to be analyzed.
It based on existing Sentence analysis process is carried out by treating each vocabulary in anolytic sentence, therefore, at this In inventive embodiments, when sentence to be analyzed has been determined, then need from the basic vocabulary obtained in sentence to be analyzed in the sentence, The basic vocabulary can be understood as the most basic vocabulary of each in the sentence obtained after being split in sentence.
For example, then the method according to this step is waited for from this when sentence to be analyzed is " I thinks that this part clothes is pretty good " It may include: " I feels " " this part ", " clothes ", " good " that the basic vocabulary obtained after basic vocabulary is obtained in anolytic sentence.
It should be noted that the mode when obtaining basic vocabulary from sentence to be analyzed can lead in this step It crosses existing any mode to carry out, such as participle operation can be carried out by treating anolytic sentence and obtain the basic word It converges.It is, of course, also possible to be carried out using other any modes, method described in this step is exemplary only, does not do herein It is specific to limit.
102, feature extraction is carried out to the basic vocabulary and sentence to be analyzed respectively, obtains corresponding to each basic vocabulary The sentence feature of word feature and sentence to be analyzed.
Due in daily interactive process, between the meaning of each vocabulary in sentence and the whole meaning of sentence not It is identical, therefore, during Sentence analysis, need to the word feature of basic vocabulary obtained in above-mentioned steps and the language of sentence Sentence feature extracts.Wherein, word feature can be understood as capable of characterizing the feature of basic vocabulary meaning or mood tendency, and language Sentence feature then can be understood as to characterize the meaning of sentence entirety or the feature of mood tendency, here, for word feature and language The type of sentence feature does not do specific restriction, can choose word feature, the specific type of sentence feature and right according to actual needs The extracting mode answered.But it should be recognized that in order to ensure the realization of the comparison between subsequent word feature and sentence feature, at this It is necessary to ensure that between extracted word feature and the sentence feature of extraction to be mutually of a sort feature in inventive embodiments.For example, When extracted word feature is the term vector of basic vocabulary, then the sentence feature extracted is the sentence vector of corresponding term vector.
103, according to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, determine described in Non-key vocabulary in sentence to be analyzed.
Based on the sentence feature for the word feature and sentence for having obtained basic vocabulary in abovementioned steps 102, word-based feature and Sentence is characterized in capable of being used to characterize the feature of vocabulary and sentence, thus this step summarize can by the comparison between feature, The word feature of each basic vocabulary in sentence sentence feature corresponding with entire sentence is carried out in this regard, to based between feature Difference degree difference, it is similarity between word feature therein and sentence feature is smaller, or the biggish basic vocabulary of difference The non-key vocabulary being determined as in this sentence.For example, when selected word feature is term vector, sentence feature is sentence vector When, then it can be compared according to the similarity between term vector and sentence vector, to similarity between the two is less than certain Threshold value, it is determined as non-key vocabulary.
104, the non-key vocabulary is rejected from the sentence to be analyzed, so as to reject after non-key vocabulary to Anolytic sentence is analyzed.
After the non-key vocabulary in sentence to be analyzed has been determined, then illustrate that these non-key vocabulary are to be analyzed for analyzing Be unnecessary consideration when sentence, therefore, influence when Sentence analysis treated in order to avoid these non-key vocabulary, then it can be with Above-mentioned non-key vocabulary is rejected from sentence to be analyzed in this step, after then obtaining eliminating non-key word to Anolytic sentence, and with the analysis of this sentence progress sentence, so as to improve accuracy when entire Sentence analysis.
Method as Sentence analysis described in Fig. 1 is further extended and is extended, and the embodiment of the present invention also provides The method of another Sentence analysis, process is as shown in Fig. 2, specific steps include:
201, basic vocabulary is obtained from sentence to be analyzed.
Specifically, in embodiments of the present invention, when obtaining basic vocabulary from sentence to be analyzed, specifically executing It can be carried out by participle operation in journey, therefore, the mode that this step specifically obtains basic vocabulary can be with are as follows: to described to be analyzed Sentence carries out participle operation, obtains the basic vocabulary.
202, feature extraction is carried out to the basic vocabulary and sentence to be analyzed respectively, obtains corresponding to each basic vocabulary The sentence feature of word feature and sentence to be analyzed.
Wherein, during practical operation, accuracy when in order to ensure subsequent analysis mentions in embodiments of the present invention The concrete mode of word feature and sentence feature is taken to can be by carrying out basic vocabulary and sentence to be analyzed based on vectorization Operation carries out feature extraction, therefore, mode when this step executes may include: firstly, determine the word of each basic vocabulary to Amount.Then the sentence vector of the sentence to be analyzed is determined.Certainly, in embodiments of the present invention, for term vector and sentence to There is no sequencing between the determination process of amount, the two can also carry out simultaneously.
203, according to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, determine described in Non-key vocabulary in sentence to be analyzed.
Specifically, this step can be with are as follows: according to the term vector of each basic vocabulary, respectively with the sentence to be analyzed Sentence vector calculated, determine each basic vocabulary for the significance level of the sentence to be analyzed, and according to each base The significance level of this vocabulary determines the non-key vocabulary in the sentence to be analyzed.Wherein, the significance level can be understood as It is recorded a demerit based on calculating between term vector and sentence vector determining.
Further, in the significance level according to each basic vocabulary, the non-key word in the sentence to be analyzed is determined When remittance, executive mode can carry out in the following manner:
Firstly, the sentence vector of the term vector of each basic vocabulary and the sentence to be analyzed is carried out dot product behaviour Make, obtains the weighted value of corresponding each basic vocabulary.Wherein, dot product, in mathematics, also known as scalar product (dot product; Scalar product, also referred to as dot product) it is the binary for receiving two vectors on real number R and returning to a real number value scalar Operation.It is the standard inner product of Euclidean space.The meaning of the operation can determine the similar of two vectors of dot product operation Property.Therefore, by will carry out dot product operation between term vector and sentence vector, matter can determine the phase between two vectors in fact Like degree, therefore, weighted value obtained in this step is the similarity degree characterized between each term vector and sentence vector.Tool Body, during carrying out dot product calculating, executive mode can be as follows:
For example, when there are when two vectors, vector A=[a1, a2 ..., an] and vector B=[b1, b2 ..., bn], then this Dot product is carried out between two vectors are as follows:
AB=a1b1+a2b2+ ...+anbn
In above-mentioned dot product formula, calculated result is equivalent to the cosine value of the angle between two vectors, i.e. cosine value Domain be located at [- 1,1], for mathematical angle, cosine value is bigger, illustrates that projection of the A vector on B vector is bigger, then says It is more similar between the two bright vectors, conversely, then illustrating that the representative of the two vectors is contrary, no when cosine value is -1 It is approximate.
Then, according to the weighted value, determine each basic vocabulary for the significance level of the sentence to be analyzed, it is described Weighted value is positively correlated with significance level.The similarity degree between term vector and sentence vector can be characterized based on weighted value, because This, the height of similarity degree can intuitively embody basic vocabulary corresponding to the term vector for the weight of the sentence to be analyzed Want degree.
Finally, the basic vocabulary by significance level lower than preset threshold is determined as non-key word according to the significance level It converges.In such manner, it is possible to obtaining the weighted value to characterize similarity between vectors in a manner of dot product operation between vector, and be based on Weighted value determines that each basic vocabulary determines non-key vocabulary therein for the significance level of sentence to be analyzed, and with this, The non-key part in sentence, i.e., non-key vocabulary, so that it is guaranteed that non-key word can be determined in a manner of more intuitive and accurate The accuracy of the determination of remittance has then established base to exclude interference when non-key vocabulary treats anolytic sentence analysis on the whole Plinth.
204, the non-key vocabulary is rejected from the sentence to be analyzed, so as to reject after non-key vocabulary to Anolytic sentence is analyzed.
Based on the non-key vocabulary determined in sentence to be analyzed in abovementioned steps 203, also, these non-key vocabulary exist Analysis result can be impacted during Sentence analysis, it therefore, then can be non-by these after non-key vocabulary has been determined Key vocabularies are rejected, so that reducing non-when to the Sentence analysis to be analyzed for eliminating above-mentioned non-key vocabulary Key vocabularies analyze it interference of result, improve accuracy when analysis.
For example, can then carry out participle operation to it when sentence to be analyzed is " I feels pretty good " and obtain basic vocabulary " I feels ", " pretty good ".Then the term vector for extracting basic vocabulary respectively obtains the term vector a of " I feels ", the word of " pretty good " Vector b and sentence vector A.Then dot product between term vector a and sentence vector A is operated, obtains weighted value 0.1, simultaneously will Dot product operates between term vector b and sentence vector A, obtains weighted value 0.7.And according to the size of weighted value, determine that vector a is corresponding Basic vocabulary " I feels " significance level be it is low, determine the significance level of the corresponding basic vocabulary of vector b " pretty good " for height. Further according to the height of significance level, basic vocabulary " I feels " is determined as non-key vocabulary, and " I feels by the non-key vocabulary " rejected from sentence to be analyzed " I feels pretty good ", the sentence to be analyzed after being rejected is " pretty good ".And it is based on eliminating " pretty good " the progress Sentence analysis of sentence to be analyzed after non-key vocabulary
In another example sentence to be analyzed 1 " I want to eat apple, orange, there are also other various fruit " in time to point It analyses in sentence 2 " I wants to eat apple ", then after carrying out participle operation, available basic vocabulary in sentence complete or collected works sentence 2 " apple ", but in the method described according to embodiments of the present invention determining to weighted value is carried out based on sentence vector and term vector When, the weight of " apple " in sentence 1 is only 0.2, and the weight of " apple " in sentence 2 is 0.6, therefore in sentence to be analyzed It can determine that basicvocabulary " apple " is non-key vocabulary in 1, therefore be by " apple when treating anolytic sentence 1 and being analyzed Fruit " carries out rejecting post analysis, and " apple " weight with higher in sentence 2 to be analyzed, and can determine that it is not is non-pass Keyword converges, therefore when treating anolytic sentence 2 and being analyzed, then need to analyze include " apple " sentence.
Further, the realization as the method to above-mentioned Sentence analysis, the embodiment of the invention provides a kind of sentences point The device of analysis, the device are mainly used for solving during existing Sentence analysis, and the result of semantic analysis is vulnerable to non-key The interference of word, so as to cause Sentence analysis accuracy susceptible the problem of, improve Sentence analysis result accuracy.For just In reading, present apparatus embodiment no longer repeats the detail content in preceding method embodiment one by one, it should be understood that this Device in embodiment can correspond to the full content realized in preceding method embodiment.The device is as shown in figure 3, specific packet It includes:
Acquiring unit 31 can be used for obtaining basic vocabulary from sentence to be analyzed;
Processing unit 32, the basic vocabulary and sentence to be analyzed that can be used for respectively obtaining the acquiring unit 31 carry out Feature extraction obtains the sentence feature of the word feature for corresponding to each basic vocabulary and sentence to be analyzed;
Determination unit 33, the word feature for each basic vocabulary that can be used for being obtained according to the processing unit 32 and institute The sentence feature for stating sentence to be analyzed determines the non-key vocabulary in the sentence to be analyzed;
Analytical unit 34, the non-key vocabulary that can be used for determining the determination unit 33 is from the sentence to be analyzed It rejects, to analyze the sentence to be analyzed after the non-key vocabulary of rejecting.
Further, as shown in figure 4, the acquiring unit 31 includes:
Processing module 311 can be used for carrying out participle operation to the sentence to be analyzed, obtain the basic vocabulary.
Further, as shown in figure 4, the processing unit 32 includes:
First determining module 321, is determined for the term vector of each basic vocabulary;
Second determining module 322 is determined for the sentence vector of the sentence to be analyzed.
Further, as shown in figure 4, the determination unit 33, can be specifically used for according to each basic vocabulary Term vector, the sentence vector with the sentence to be analyzed is calculated respectively, determines each basic vocabulary for described to be analyzed The significance level of sentence, and according to the significance level of each basic vocabulary, determine the non-key vocabulary in the sentence to be analyzed.
Further, as shown in figure 4, the determination unit 33 includes:
Computing module 331 can be used for the sentence of the term vector of each basic vocabulary and the sentence to be analyzed Vector carries out dot product operation, obtains the weighted value of corresponding each basic vocabulary;
First determining module 332 can be used for the weighted value calculated according to the computing module 331, determine each basic For the significance level of the sentence to be analyzed, the weighted value is positively correlated vocabulary with significance level;
Second determining module 333 can be used for the significance level determined according to first determining module 332, will be important Degree is determined as non-key vocabulary lower than the basic vocabulary of preset threshold.
Further, the embodiment of the invention also provides a kind of computer readable storage mediums, wherein the computer can It reads to be stored with computer program on storage medium, wherein real when the computer program is executed by one or more computing devices The method of existing above-mentioned Sentence analysis.
In addition, including one or more computing devices and one or more storage dresses the embodiment of the invention also provides one kind The system set, record has computer program on one or more of storage devices, and the computer program is one Or the method that multiple computing devices make one or more of computing devices realize above-mentioned Sentence analysis when executing.
In conclusion a kind of method and device for Sentence analysis that the embodiment of the present invention proposes, can be improved Sentence analysis As a result accuracy.It is relatively existing that based on during Sentence analysis, when the semantic analysis of sentence, there are the dry of non-key vocabulary The problem of disturbing, the present invention can obtain basic vocabulary from sentence to be analyzed, then respectively to the basic vocabulary and to be analyzed Sentence carries out feature extraction, obtains the sentence feature of the word feature for corresponding to each basic vocabulary and sentence to be analyzed, Zhi Hougen According to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, determine in the sentence to be analyzed Non-key vocabulary finally rejects the non-key vocabulary from the sentence to be analyzed, after to non-key vocabulary is rejected Sentence to be analyzed analyzed, to realize analysis to sentence, compared to existing analysis mode, the present invention passes through to word Remittance feature and sentence feature determine the non-key vocabulary in sentence to be analyzed, thus when carrying out Sentence analysis by non-key vocabulary It is rejected, when then ensuring subsequent analysis, eliminates interference when analysis of the non-key vocabulary to sentence, improve sentence Precision of analysis.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.
It is understood that the correlated characteristic in the above method and device can be referred to mutually.In addition, in above-described embodiment " first ", " second " etc. be and not represent the superiority and inferiority of each embodiment for distinguishing each embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein. Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.
In addition, memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM), memory includes extremely A few storage chip.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flashRAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art, Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement, Improve etc., it should be included within the scope of the claims of this application.

Claims (10)

1. a kind of method of Sentence analysis, wherein the described method includes:
Basic vocabulary is obtained from sentence to be analyzed;
Feature extraction is carried out to the basic vocabulary and sentence to be analyzed respectively, obtain corresponding to the word feature of each basic vocabulary with And the sentence feature of sentence to be analyzed;
According to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, the language to be analyzed is determined Non-key vocabulary in sentence;
The non-key vocabulary is rejected from the sentence to be analyzed, so as to the sentence to be analyzed after the non-key vocabulary of rejecting It is analyzed.
2. the method for claim 1, wherein the basic vocabulary that obtains from sentence to be analyzed includes:
Participle operation is carried out to the sentence to be analyzed, obtains the basic vocabulary.
3. method according to claim 2, wherein described to be mentioned respectively to the basic vocabulary and sentence to be analyzed progress feature It takes, the sentence feature for obtaining the word feature for corresponding to each basic vocabulary and sentence to be analyzed includes:
Determine the term vector of each basic vocabulary;And
Determine the sentence vector of the sentence to be analyzed.
4. method as claimed in claim 3, wherein the word feature according to each basic vocabulary and it is described to point The sentence feature for analysing sentence, determines the non-key vocabulary in the sentence to be analyzed, comprising:
According to the term vector of each basic vocabulary, the sentence vector with the sentence to be analyzed is calculated respectively, is determined Each basic vocabulary determines institute and according to the significance level of each basic vocabulary for the significance level of the sentence to be analyzed State the non-key vocabulary in sentence to be analyzed.
5. method as claimed in claim 4, wherein the term vector according to each basic vocabulary, respectively with it is described The sentence vector of sentence to be analyzed is calculated, determine each basic vocabulary for the significance level of the sentence to be analyzed, and According to the significance level of each basic vocabulary, the non-key vocabulary in the sentence to be analyzed is determined, comprising:
The sentence vector of the term vector of each basic vocabulary and the sentence to be analyzed is subjected to dot product operation, obtains correspondence The weighted value of each basic vocabulary;
According to the weighted value, determine each basic vocabulary for the significance level of the sentence to be analyzed, the weighted value with Significance level is positively correlated;
According to the significance level, the basic vocabulary by significance level lower than preset threshold is determined as non-key vocabulary.
6. a kind of device of Sentence analysis, wherein described device includes:
Acquiring unit, for obtaining basic vocabulary from sentence to be analyzed;
Processing unit obtains corresponding to each basic for carrying out feature extraction to the basic vocabulary and sentence to be analyzed respectively The word feature of vocabulary and the sentence feature of sentence to be analyzed;
Determination unit, for according to the word feature of each basic vocabulary and the sentence feature of the sentence to be analyzed, really Non-key vocabulary in the fixed sentence to be analyzed;
Analytical unit, for rejecting the non-key vocabulary from the sentence to be analyzed, so as to the non-key vocabulary of rejecting Sentence to be analyzed afterwards is analyzed.
7. device as claimed in claim 6, wherein the acquiring unit includes:
Processing module obtains the basic vocabulary for carrying out participle operation to the sentence to be analyzed.
8. device as claimed in claim 7, wherein the processing unit includes:
First determining module, for determining the term vector of each basic vocabulary;
Second determining module, for determining the sentence vector of the sentence to be analyzed.
9. a kind of computer readable storage medium, store computer program, the computer program when being executed by processor, It realizes and includes the steps that in the method for Sentence analysis according to any one of claim 1 to 5.
10. a kind of computer equipment, the computer equipment include:
One or more processors;And
Memory, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors It realizes and includes the steps that in the method for Sentence analysis according to any one of claim 1 to 5.
CN201910467986.6A 2019-05-31 2019-05-31 Statement analysis method and device Active CN110210030B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910467986.6A CN110210030B (en) 2019-05-31 2019-05-31 Statement analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910467986.6A CN110210030B (en) 2019-05-31 2019-05-31 Statement analysis method and device

Publications (2)

Publication Number Publication Date
CN110210030A true CN110210030A (en) 2019-09-06
CN110210030B CN110210030B (en) 2021-02-09

Family

ID=67789868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910467986.6A Active CN110210030B (en) 2019-05-31 2019-05-31 Statement analysis method and device

Country Status (1)

Country Link
CN (1) CN110210030B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906403A (en) * 2021-04-25 2021-06-04 中国平安人寿保险股份有限公司 Semantic analysis model training method and device, terminal equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018032213A (en) * 2016-08-24 2018-03-01 シャープ株式会社 Information processor, information processing system, information processing method and program
CN109213856A (en) * 2018-10-22 2019-01-15 广东小天才科技有限公司 A kind of method for recognizing semantics and system
CN109492225A (en) * 2018-11-08 2019-03-19 大连瀚闻资讯有限公司 A kind of public feelings information text handling method of rare foreign languages country
CN109522544A (en) * 2018-09-27 2019-03-26 厦门快商通信息技术有限公司 Sentence vector calculation, file classification method and system based on Chi-square Test
CN109815492A (en) * 2019-01-04 2019-05-28 平安科技(深圳)有限公司 A kind of intension recognizing method based on identification model, identification equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018032213A (en) * 2016-08-24 2018-03-01 シャープ株式会社 Information processor, information processing system, information processing method and program
CN109522544A (en) * 2018-09-27 2019-03-26 厦门快商通信息技术有限公司 Sentence vector calculation, file classification method and system based on Chi-square Test
CN109213856A (en) * 2018-10-22 2019-01-15 广东小天才科技有限公司 A kind of method for recognizing semantics and system
CN109492225A (en) * 2018-11-08 2019-03-19 大连瀚闻资讯有限公司 A kind of public feelings information text handling method of rare foreign languages country
CN109815492A (en) * 2019-01-04 2019-05-28 平安科技(深圳)有限公司 A kind of intension recognizing method based on identification model, identification equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906403A (en) * 2021-04-25 2021-06-04 中国平安人寿保险股份有限公司 Semantic analysis model training method and device, terminal equipment and storage medium

Also Published As

Publication number Publication date
CN110210030B (en) 2021-02-09

Similar Documents

Publication Publication Date Title
Šilić et al. Visualization of text streams: A survey
CN112395506A (en) Information recommendation method and device, electronic equipment and storage medium
CN109582948B (en) Method and device for extracting evaluation viewpoints
CN110019668A (en) A kind of text searching method and device
CN104978356B (en) A kind of recognition methods of synonym and device
CN110162780A (en) The recognition methods and device that user is intended to
CN106610931B (en) Topic name extraction method and device
CN109582954A (en) Method and apparatus for output information
CN112527970B (en) Data dictionary standardization processing method, device, equipment and storage medium
WO2012158572A2 (en) Exploiting query click logs for domain detection in spoken language understanding
CN110390095A (en) Sentence mask method and sentence annotation equipment
CN106598949A (en) Method and device for confirming contribution degree of words to text
CN112329460A (en) Text topic clustering method, device, equipment and storage medium
CN106598997B (en) Method and device for calculating text theme attribution degree
US10482162B2 (en) Automatic equation transformation from text
Tappler et al. Active model learning of stochastic reactive systems
Schröder et al. Small-text: Active learning for text classification in python
CN110210030A (en) The method and device of Sentence analysis
CN110019670A (en) A kind of text searching method and device
CN107832271B (en) Function image drawing method, device, equipment and computer storage medium
CN108875743A (en) A kind of text recognition method and device
CN108460038A (en) Rule matching method and its equipment
Schuster et al. Alignment approximation for process trees
CN111797995B (en) Method and device for generating interpretation report of model prediction sample
CN109684473A (en) A kind of automatic bulletin generation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200728

Address after: 518000 Nanshan District science and technology zone, Guangdong, Zhejiang Province, science and technology in the Tencent Building on the 1st floor of the 35 layer

Applicant after: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.

Address before: 100029, Beijing, Chaoyang District new East Street, building No. 2, -3 to 25, 101, 8, 804 rooms

Applicant before: Tricorn (Beijing) Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant