CN108255810A - Near synonym method for digging, device and electronic equipment - Google Patents

Near synonym method for digging, device and electronic equipment Download PDF

Info

Publication number
CN108255810A
CN108255810A CN201810023323.0A CN201810023323A CN108255810A CN 108255810 A CN108255810 A CN 108255810A CN 201810023323 A CN201810023323 A CN 201810023323A CN 108255810 A CN108255810 A CN 108255810A
Authority
CN
China
Prior art keywords
near synonym
candidate
semantic similarity
text
synonym
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810023323.0A
Other languages
Chinese (zh)
Other versions
CN108255810B (en
Inventor
蒋宏飞
李健铨
晋耀红
杨凯程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dingfu Intelligent Technology Co., Ltd
Original Assignee
Beijing Shenzhou Taiyue Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenzhou Taiyue Software Co Ltd filed Critical Beijing Shenzhou Taiyue Software Co Ltd
Priority to CN201810023323.0A priority Critical patent/CN108255810B/en
Publication of CN108255810A publication Critical patent/CN108255810A/en
Application granted granted Critical
Publication of CN108255810B publication Critical patent/CN108255810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the present invention provides a kind of near synonym method for digging, device and electronic equipment, is related to computer application technology.Wherein near synonym method for digging includes:Obtain pending text;Obtain the default near synonym of the text;By the Documents Similarity algorithm based on term vector, the first semantic similarity between the text and candidate near synonym is obtained;And by the Documents Similarity algorithm, obtain the second semantic similarity between the default near synonym and the candidate near synonym;According to first semantic similarity and second semantic similarity, the near synonym of the text are determined.Technical solution provided in an embodiment of the present invention, just can be as the near synonym of pending text only when the term vector distance between candidate near synonym and pending text and its default near synonym is close;Therefore, near synonym can be effectively improved and excavate accuracy rate.

Description

Near synonym method for digging, device and electronic equipment
Technical field
This application involves natural language processing technique fields, and in particular to a kind of near synonym method for digging, device and electronics Equipment.
Background technology
Text mining refers to extract unknown, intelligible in advance, final available knowledge from a large amount of text datas Process, while these knowledge preferably organizational information is used to refer in the future.Nearly justice Concept Mining is one in text mining Important branch.Nearly justice Concept Mining refers to find there is the word of close meaning or the mistake of text with a word or one section of text Journey.
At present, a kind of common nearly adopted Concept Mining method is the nearly adopted Concept Mining method of word-based vector space, should Term vector spatial distribution is considered as semantic space distribution by method, using the distance between two term vectors weigh two equivalents it Between similarity, i.e.,:Term vector distance between two words is nearer, then the semanteme of two words is more close.Wherein, term vector (Distributed Representation) is a kind of mode for the word in language to be carried out to mathematicization, and term vector is one Kind low-dimensional real vector, and the semantic information comprising word.One word is indicated using the distributed term vector represented so that Similar word is closer to the distance in term vector space.
However, in process of the present invention is realized, inventor has found that at least there are the following problems in the prior art:Due in word In vector space, not only including size in semantic space, positive and negative (polarity, the direction) of semantic space is further included, therefore, only in accordance with When the distance of term vector carries out a word in term vector space nearly justice excavation, it may appear that semantically opposite opposite idea, example As " purchase " can be excavated " sale " by the distance of term vector.
In conclusion the prior art there are near synonym excavate accuracy rate it is relatively low the problem of.
Invention content
The embodiment of the present invention provides a kind of near synonym method for digging, device and electronic equipment, is deposited to solve the prior art The problem of accuracy rate is relatively low is excavated near synonym.
In a first aspect, a kind of near synonym method for digging is provided in the embodiment of the present invention, including:Obtain pending text; Obtain the default near synonym of the text;By the Documents Similarity algorithm based on term vector, the text and each candidate are obtained The first semantic similarity between near synonym;And by the Documents Similarity algorithm, obtain the default near synonym and institute The second semantic similarity between candidate near synonym is stated, candidate's near synonym are obtained from default vocabulary;According to described first Semantic similarity and second semantic similarity determine the near synonym of the text.
With reference to first aspect, the present invention is described according to the described first semanteme in the first realization method of first aspect Similarity and second semantic similarity, and determine the near synonym of the text, including:According to the first selection rule and each time The first semantic similarity ranking of near synonym is selected, candidate near synonym are chosen, forms the first candidate near synonym collection;With And the second semantic similarity ranking according to the second selection rule and each candidate near synonym, candidate near synonym are selected It takes, forms the second candidate near synonym collection;It obtains the described first candidate near synonym collection and the second candidate near synonym collection wraps jointly The candidate near synonym included;According to the candidate near synonym included jointly, the near synonym are determined.
The first realization method with reference to first aspect, the present invention are described in second of realization method of first aspect According to the candidate near synonym included jointly, and determine the near synonym, including:Judge the nearly justice of the candidate included jointly Whether word meets word-building rule;If above-mentioned judging result is yes, the candidate near synonym of the word-building rule will be met as institute State near synonym.
With reference to first aspect, the present invention is described according to the described first semanteme in the third realization method of first aspect Similarity and second semantic similarity, and determine the near synonym of the text, including:According to first semantic similarity With second semantic similarity and default weight, the third semanteme phase between the text and the candidate near synonym is obtained Like degree;Candidate near synonym are chosen according to third selection rule and the third semantic similarity;According to the candidate of selection Near synonym determine the near synonym.
The third realization method with reference to first aspect, the present invention are described in the 4th kind of realization method of first aspect Third semantic similarity is calculated using equation below:Z=α * X+ (1- α) * Y, wherein, X is that first semantic similarity, Y are Second semantic similarity, α are the default weights, and for α between 0-1, Z is the third semantic similarity.
The third realization method or the 4th kind of realization method of first aspect with reference to first aspect, the present invention is in first party In the 5th kind of realization method in face, the candidate near synonym according to selection, and determine the near synonym, including:Described in judgement Whether the candidate near synonym of selection meet word-building rule;If above-mentioned judging result is yes, the time of the word-building rule will be met Near synonym are selected as the near synonym.
Second of realization method or the 5th kind of realization method of first aspect with reference to first aspect, the present invention is in first party In the 6th kind of realization method in face, the word-building rule includes:Candidate's near synonym include the word in the text.
Second aspect, an embodiment of the present invention provides a kind of near synonym excavating gear, including for performing the above method The corresponding module of near synonym excavating gear behavior in design.The module can be software and/or hardware.
The third aspect, the embodiment of the present invention additionally provide a kind of electronic equipment, including processor and memory, the place Managing device, it is configured as that electronic equipment is supported to perform corresponding function in above-mentioned near synonym method for digging.The memory be used for Processor couples, and preserves and performs the necessary program instruction of above-mentioned near synonym method for digging and data.
Fourth aspect, an embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage Instruction is stored in medium, when run on a computer so that computer performs the method described in above-mentioned various aspects.
5th aspect, an embodiment of the present invention provides a kind of computer program product including instructing, when it is in computer During upper operation so that computer performs the method described in above-mentioned various aspects.
Compared to the prior art, scheme provided in an embodiment of the present invention, by the default near synonym for obtaining the text;It is logical The Documents Similarity algorithm based on term vector is crossed, obtains the first semantic similarity between the text and candidate near synonym;With And by the Documents Similarity algorithm, obtain the second semantic phase between the default near synonym and the candidate near synonym Like degree;According to first semantic similarity and second semantic similarity, the near synonym of the text are determined;This processing Mode so that not only the term vector distance between pending text and candidate near synonym can generate shadow near synonym Result It rings, while the term vector distance between the default near synonym of pending text and candidate near synonym also can be near synonym Result It has an impact, only when the term vector distance between candidate near synonym and pending text and its default near synonym is close, It just can be as the near synonym of pending text;Therefore, near synonym can be effectively improved and excavate accuracy rate.
The aspects of the invention or other aspects can more straightforwards in the following description.
Description of the drawings
Fig. 1 is a kind of flow diagram of near synonym method for digging provided in an embodiment of the present invention;
Fig. 2 is a kind of first idiographic flow schematic diagram of near synonym method for digging provided in an embodiment of the present invention;
Fig. 3 is a kind of second idiographic flow schematic diagram of near synonym method for digging provided in an embodiment of the present invention;
Fig. 4 is a kind of third idiographic flow schematic diagram of near synonym method for digging provided in an embodiment of the present invention;
Fig. 5 is a kind of 4th idiographic flow schematic diagram of near synonym method for digging provided in an embodiment of the present invention;
Fig. 6 is a kind of structure diagram of near synonym excavating gear provided in an embodiment of the present invention;
Fig. 7 is a kind of first concrete structure schematic diagram of near synonym excavating gear provided in an embodiment of the present invention;
Fig. 8 is a kind of second concrete structure schematic diagram of near synonym excavating gear provided in an embodiment of the present invention;
Fig. 9 is the structure diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with attached drawing, the technical solution in the embodiment of the present invention is explained.
For the ease of understanding the technical solution of the embodiment of the present invention, the basic thought of scheme is made briefly first below It is bright.
Near synonym method for digging provided in an embodiment of the present invention, basic thought are:Not only pending text and candidate are near Term vector distance between adopted word can have an impact near synonym Result, while the default near synonym of pending text are with waiting Select between near synonym term vector distance near synonym Result can also be had an impact, only when candidate near synonym with it is pending It, just can be as the near synonym of pending text when term vector distance between text and its default near synonym is close.Therefore, it adopts With near synonym method for digging provided in an embodiment of the present invention, near synonym can be effectively improved and excavate accuracy rate.
With reference to Fig. 1, near synonym method for digging provided in an embodiment of the present invention is described in detail.
In 101 parts, pending text is obtained.
For text language angle, the pending text can be the text of various language, for example, Chinese text or English text etc..
In 102 parts, the default near synonym of the pending text are obtained.
The default near synonym can be the near synonym for having same or similar semanteme with the text, be the near of standard Adopted word, the default near synonym do not include the word for having opposite semanteme with the text.It when it is implemented, can be by manually setting The mode of putting sets the default near synonym.
In 103 parts, by the Documents Similarity algorithm based on term vector, obtain the text and each candidate near synonym it Between the first semantic similarity;And it by the Documents Similarity algorithm, obtains the default near synonym and the candidate is near The second semantic similarity between adopted word.
Near synonym method for digging provided in an embodiment of the present invention only when candidate near synonym and pending text and its is preset It, just can be as the near synonym of pending text when term vector distance between near synonym is close.Therefore, it treats getting After handling text and its default near synonym, it is necessary first to by the Documents Similarity algorithm based on term vector, obtain respectively described in The between the first semantic similarity and the default near synonym and the candidate near synonym between text and candidate near synonym Two semantic similarities.
The candidate near synonym of the text can include all words in default vocabulary.Candidate's near synonym, can be with It is the part word filtered out from default vocabulary, for example, can be according to business scope (such as financial field, the electric business visitor belonging to text Take field etc.), the relevant part word in the business scope is filtered out from default vocabulary;This processing mode so that reduce candidate The range of search of near synonym;Therefore, excavation speed can be effectively improved.
First semantic similarity can be the corresponding term vector of text term vector corresponding with candidate near synonym The distance between.When the first semantic similarity between the text and candidate near synonym is higher, since term vector not only has There is size, also with positive negative direction, therefore, candidate near synonym both may be the near synonym of text, it is also possible to the antisense of text Word.
Correspondingly, second semantic similarity can be the corresponding term vector of the default near synonym and candidate near synonym The distance between corresponding term vector.
Since the Documents Similarity algorithm based on term vector belongs to the more mature prior art, herein not superfluous It states.
In 104 parts, according to first semantic similarity and second semantic similarity, the near of the text is determined Adopted word.
After getting first semantic similarity and second semantic similarity, it is possible to consider the two Similarity value, and according to considering as a result, choosing near synonym of one or more words as text from candidate near synonym.
When it is implemented, various ways, which can be used, realizes 104 parts, four kinds of available specific embodiments are given below, And embodiments thereof is illustrated respectively.It should be noted that 104 parts are not limited to following four embodiment, also may be used To be that arbitrarily can determine the near synonym of the text according to first semantic similarity and second semantic similarity Specific embodiment.
Mode one,
Fig. 2 is referred to, the first specific implementation of 104 parts near synonym method for digging is provided for the embodiment of the present invention The flow chart of mode.In one example, 104 parts may include following subdivision:
In 201 parts, candidate near synonym are chosen according to the first selection rule and first semantic similarity, Form the first candidate near synonym collection;And according to the second selection rule and second semantic similarity to candidate near synonym into Row is chosen, and forms the second candidate near synonym collection.
First selection rule can choose semantic similarity to come high-order candidate near synonym, such as choose semantic The candidate near synonym of similarity maximum;Can also be that the semantic similarity of selection preset quantity comes the candidate near synonym of a high position, Such as choose the candidate near synonym that semantic similarity comes front three;It can also be that selection is corresponding higher than the semantic similarity of threshold value Candidate near synonym, if threshold value is 0.6, then using all 0.6 corresponding candidate near synonym of semantic similarity of being more than as near synonym.
In one example, first selection rule is to choose candidate nearly justice corresponding higher than the semantic similarity of threshold value Word.Affiliated threshold value can be according to business demand by artificial empirically determined.
Second selection rule can be identical with first selection rule, can also be with first selection rule not Together.
When the first semantic similarity meets first selection rule, its corresponding candidate near synonym is selected, by this A little candidate's near synonym form a candidate near synonym collection, i.e., the first candidate near synonym collection.Correspondingly, when the second semantic similarity is expired During foot second selection rule, its corresponding candidate near synonym is selected, it is near to form a candidate by these candidate's near synonym Adopted word set, i.e., the second candidate near synonym collection.
In 202 parts, obtain the described first candidate near synonym collection and the second candidate near synonym collection includes jointly Candidate near synonym.
If a candidate near synonym not only occurred in the described first candidate near synonym collection, but also candidate near described second Occur in adopted word set, then candidate's near synonym are exactly that the described first candidate near synonym collection and the second candidate near synonym collection are common Including candidate near synonym.
In 203 parts, using the candidate near synonym included jointly as the near synonym of the text.
Mode two,
Fig. 3 is referred to, second of specific implementation of 104 parts near synonym method for digging is provided for the embodiment of the present invention The flow chart of mode.In one example, 104 parts may include following subdivision:
In 301 parts, candidate near synonym are chosen according to the first selection rule and first semantic similarity, Form the first candidate near synonym collection;And according to the second selection rule and second semantic similarity to candidate near synonym into Row is chosen, and forms the second candidate near synonym collection.
301 parts are identical with above-mentioned 201 part, and details are not described herein again.
In 302 parts, obtain the described first candidate near synonym collection and the second candidate near synonym collection includes jointly Candidate near synonym.
302 parts are identical with above-mentioned 202 part, and details are not described herein again.
In 303 parts, judge whether each candidate near synonym included jointly meet word-building rule, will meet described The candidate near synonym of word-building rule are as the near synonym.
Mode has added a restrictive condition, i.e., described word-building rule second is that on the basis of mode one.It is advised using word-building Then can limited way one Result.Using this processing mode so that filter out the nearly justice of candidate for not meeting word-building rule Word;Therefore, the accuracy and correlation of near synonym Result can be effectively improved.
At least one of the word-building rule, it may include following rule:1) the candidate near synonym include the text In word, for example, the near synonym of " promotion " include " raisings ", " raising " etc., including " carrying " or " liter " near synonym;2) it is described The word in the antonym of the text and default near synonym is not included in candidate near synonym, if for example, the antonym packet of " promotion " " reduction " is included, then does not include " drop " and/or " low " in the near synonym " promoted ".
In one example, word-building rule is:Candidate near synonym include the word in pending text and do not include waiting to locate Manage the word in the antonym of text;It is described that the step of whether candidate near synonym included jointly meet word-building rule judged, Following manner can be used:The antonym of the pending text and default near synonym is obtained, is judged according to the antonym got Whether each candidate near synonym included jointly meet word-building rule.By taking pending text " promotion " as an example, can first it obtain not Antonym (such as " reduction ") comprising " carrying " and " liter " then judges that each candidate included jointly is near further according to the antonym Whether adopted word meets the word-building rule, can filter out the near synonym such as " raising ", " raising " as a result, filter out comprising " drop " and/or The non-near synonym of " low " word.Using this processing mode so that finally determining near synonym are neither included in default antonym Word, and may include the word occurred in pending text;Therefore, the accuracy rate of near synonym can be effectively improved.
For example, the pending text of user's input is:The prior art, the technical solution of Fig. 2, Fig. 3 is respectively adopted in " purchase " After technical solution carries out near synonym excavation processing, following near synonym Result is obtained respectively:
Result under the prior art:(' sell ', 0.5741), (' purchase ', 0.5335), (' displacement ', 0.5225)
The Result of mode one:(' purchase ', 0.5335), (' displacement ', 0.5225), (' purchase ', 0.5192)
The Result of mode two:(' purchase ', 0.5335), (' purchase ', 0.5192), (' buy ', 0.4935)
For another example user inputs:User inputs pending text:" why ", be respectively adopted the prior art, mode one, After mode two carries out near synonym excavation processing, following near synonym Result is obtained respectively:
Result under the prior art:(' why ', 0.6767), (' why ', 0.5700), (' ', 0.5635)
The Result of mode one:(' why ', 0.6767), (' why ', 0.5700), (' ', 0.5635)
The Result of mode two:(' why ', 0.6767), (' why ', 0.5700), (' why ', 0.4830)
In another example, before judging whether each candidate near synonym included jointly meet word-building rule, Further include following steps:1) part of speech of the pending text is obtained;2) judge whether the part of speech is verb or adjective;If It is that the step of whether each candidate near synonym included jointly meet word-building rule then judged into described.Through many experiments Show the technical solution of the embodiment of the present invention, for the pending text of the parts of speech such as verb or adjective, limited using word-building rule Determine the Result of near synonym, the accuracy rate of near synonym can be effectively ensured.
Mode three,
Fig. 4 is referred to, the third specific implementation of 104 parts near synonym method for digging is provided for the embodiment of the present invention The flow chart of mode.In one example, 104 parts may include following subdivision:
In 401 parts, according to first semantic similarity and second semantic similarity and default weight, obtain Take the third semantic similarity between the text and the candidate near synonym.
In the present embodiment, the semantic similarity between text and the candidate near synonym, depends not only on text pair The distance between the term vector answered term vector corresponding with the candidate near synonym, additionally depend on the corresponding word of default near synonym to The distance between amount term vector corresponding with candidate's near synonym.
Two distances are determined the influence power of third semantic similarity by default weight.The default weight, can be by people Work is according to business demand and empirically determined.The default weight may be provided between 0 to 1.
In one example, the third semantic similarity is calculated using equation below:
Z=α * X+ (1- α) * Y
Wherein, it is second semantic similarity that X, which is first semantic similarity, Y, and α is the default weight, and α exists Between 0-1, Z is the third semantic similarity.
In 402 parts, candidate near synonym are chosen according to third selection rule and the third semantic similarity.
The third selection rule can choose third semantic similarity to come high-order candidate near synonym, can also It is that the third semantic similarity of selection preset quantity comes the candidate near synonym of a high position, can also be the third chosen higher than threshold value The corresponding candidate near synonym of semantic similarity.
In 403 parts, using the candidate near synonym of selection as the near synonym.
Mode four,
Fig. 5 is referred to, the 4th kind of specific implementation of 104 parts near synonym method for digging is provided for the embodiment of the present invention The flow chart of mode.In one example, 104 parts may include following subdivision:
In 501 parts, according to first semantic similarity and second semantic similarity and default weight, obtain Take the third semantic similarity between the text and the candidate near synonym.
501 parts are identical with above-mentioned 401 part, and details are not described herein again.
In 502 parts, candidate near synonym are chosen according to third selection rule and the third semantic similarity.
502 parts are identical with above-mentioned 402 part, and details are not described herein again.
In 503 parts, judge whether each candidate near synonym of the selection meet word-building rule, the word-building will be met The candidate near synonym of rule are as the near synonym.
503 parts are similar to above-mentioned 403 part, and details are not described herein again.
From above-described embodiment as can be seen that scheme provided in an embodiment of the present invention, by obtaining the default near of the text Adopted word;By the Documents Similarity algorithm based on term vector, the first semantic phase between the text and candidate near synonym is obtained Like degree;And by the Documents Similarity algorithm, obtain second between the default near synonym and the candidate near synonym Semantic similarity;According to first semantic similarity and second semantic similarity, the near synonym of the text are determined;This Kind processing mode so that not only the term vector distance between pending text and candidate near synonym can produce near synonym Result It is raw to influence, while the term vector distance between the default near synonym of pending text and candidate near synonym can also excavate near synonym As a result it has an impact, only when the term vector distance between candidate near synonym and pending text and its default near synonym is close When, it just can be as the near synonym of pending text;Therefore, near synonym can be effectively improved and excavate accuracy rate.
Corresponding with a kind of near synonym method for digging of the present invention, the present invention also provides a kind of near synonym excavating gears.
The structure diagram that involved near synonym excavating gear is related in above-described embodiment shown in Fig. 6, the text are near Adopted word excavating gear includes:
Text acquiring unit 601, for obtaining pending text;
Default near synonym acquiring unit 602, for obtaining the default near synonym of the text;
Semantic similarity acquiring unit 603, for by the Documents Similarity algorithm based on term vector, obtaining the text With the first semantic similarity between each candidate near synonym;And it by the Documents Similarity algorithm, obtains described default near The second semantic similarity between adopted word and the candidate near synonym, candidate's near synonym are obtained from default vocabulary;
Near synonym determination unit 604, for according to first semantic similarity and second semantic similarity, determining The near synonym of the text.
First concrete structure schematic diagram of involved near synonym excavating gear in above-described embodiment shown in Fig. 7.It is optional , the near synonym determination unit 604 includes:
Candidate near synonym collection obtains subelement 701, for described the according to the first selection rule and each candidate near synonym One semantic similarity ranking, chooses candidate near synonym, forms the first candidate near synonym collection;And it is chosen according to second The second semantic similarity ranking of regular and each candidate near synonym, chooses candidate near synonym, it is candidate to form second Near synonym collection;
Candidate near synonym obtain subelement 702, near for obtaining the described first candidate near synonym collection and second candidate The candidate near synonym that adopted word set includes jointly;
First near synonym determination subelement 703, for according to the candidate near synonym included jointly, determining the nearly justice Word.
Optionally, the first near synonym determination subelement
, specifically for obtaining the antonym of the pending text and default near synonym, sentenced according to the antonym got Whether the candidate near synonym included jointly that break meet word-building rule
, the candidate near synonym of the word-building rule will be met as the near synonym.
Second concrete structure schematic diagram of involved near synonym excavating gear in above-described embodiment shown in Fig. 8.It is optional , the near synonym determination unit 604 includes:
Third semantic similarity obtains subelement 801, for according to first semantic similarity and second semanteme Similarity and default weight obtain the third semantic similarity between the text and the candidate near synonym;
Candidate near synonym choose subelement 802, for according to third selection rule and the third semantic similarity to waiting Near synonym is selected to be chosen;
Third near synonym determination subelement 803 for the candidate near synonym according to selection, determines the near synonym.
Optionally, the third semantic similarity is calculated using equation below:
Z=α * X+ (1- α) * Y
Wherein, it is second semantic similarity that X, which is first semantic similarity, Y, and α is the default weight, and α exists Between 0-1, Z is the third semantic similarity.
Optionally, the third near synonym determination subelement
, specifically for obtaining the antonym of the pending text and default near synonym, sentenced according to the antonym got Whether the candidate near synonym of the disconnected selection meet word-building rule, will meet the candidate near synonym of the word-building rule as described in Near synonym.
Optionally, the word-building rule includes:
Candidate's near synonym include the word in the text.
From above-described embodiment as can be seen that near synonym excavating gear provided in an embodiment of the present invention, by obtaining the text This default near synonym;By the Documents Similarity algorithm based on term vector, obtain between the text and candidate near synonym First semantic similarity;And by the Documents Similarity algorithm, obtain the default near synonym and the candidate near synonym Between the second semantic similarity;According to first semantic similarity and second semantic similarity, the text is determined Near synonym;This processing mode so that not only the term vector distance between pending text and candidate near synonym can be to nearly justice Word Result has an impact, while the term vector distance between the default near synonym of pending text and candidate near synonym also can Near synonym Result is had an impact, only when the word between candidate near synonym and pending text and its default near synonym to Span from it is close when, just can be as the near synonym of pending text;Therefore, near synonym can be effectively improved and excavate accuracy rate.
Fig. 9 shows the block diagram that a kind of electronic equipment provided in an embodiment of the present invention is related to.
The electronic equipment includes processor 901 and memory 902.Processor 901 performs near synonym in Fig. 1 to Fig. 5 and digs The processing procedure of pick and/or other processes for technology described herein.Memory 902 excavates for storing near synonym The program code and data of process.
Optionally, the electronic equipment may also include input equipment and/or display, wherein, wherein, input equipment is used for Input includes pending text, and display can be used for the near synonym of the display text.
Optionally, the electronic equipment may also include communication interface, and communication interface is used to implement the equipment and is set with other Communication between standby.For example, when the equipment is RCS, the communication interface can be used to implement between RRS and RCS to lead to The common public radio interface (common public radio interface, CPRI) of letter.
It is designed it is understood that Fig. 9 is only simplifying for electronic equipment.It is understood that electronic equipment can wrap Containing any number of processor, memory, input equipment, display, communication interface.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or its arbitrary combination real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and performing the computer program instructions, all or It partly generates according to the flow or function described in the embodiment of the present invention.The computer can be all-purpose computer, special meter Calculation machine, computer network or other programmable devices.The computer instruction can be stored in computer readable storage medium In or from a computer readable storage medium to another computer readable storage medium transmit, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is the data storage devices such as server, the data center integrated comprising one or more usable mediums.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state disk Solid state disk (SSD) etc..
Just to refer each other for identical similar part between each embodiment in this specification.Especially for a kind of nearly justice For the embodiment of word excavating gear, since it is substantially similar to a kind of near synonym method for digging embodiment, so the ratio of description Relatively simple, related part is referring to the explanation in a kind of near synonym method for digging embodiment.
Invention described above embodiment is not intended to limit the scope of the present invention..

Claims (10)

1. a kind of near synonym method for digging, which is characterized in that including:
Obtain pending text;
Obtain the default near synonym of the text;
By the Documents Similarity algorithm based on term vector, the first semantic phase between the text and each candidate near synonym is obtained Like degree;And by the Documents Similarity algorithm, obtain second between the default near synonym and the candidate near synonym Semantic similarity, candidate's near synonym are obtained from default vocabulary;
According to first semantic similarity and second semantic similarity, the near synonym of the text are determined.
It is 2. according to the method described in claim 1, it is characterized in that, described according to first semantic similarity and described second Semantic similarity, and determine the near synonym of the text, including:
According to the first semantic similarity ranking of the first selection rule and each candidate near synonym, candidate near synonym are selected It takes, forms the first candidate near synonym collection;It is and semantic similar with described the second of each candidate near synonym according to the second selection rule Ranking is spent, candidate near synonym are chosen, forms the second candidate near synonym collection;
Obtain the candidate near synonym that the described first candidate near synonym collection and the second candidate near synonym collection include jointly;
According to the candidate near synonym included jointly, the near synonym are determined.
3. according to the method described in claim 2, it is characterized in that, described according to the candidate near synonym included jointly, and Determine the near synonym, including:
The antonym of the pending text and default near synonym is obtained, judges described jointly to include according to the antonym got Each candidate near synonym whether meet word-building rule,
The candidate near synonym of the word-building rule will be met as the near synonym.
4. according to the method described in claim 3, it is characterized in that, in the acquisition pending text and default near synonym Antonym before, the method further includes:
Obtain the part of speech of the pending text;
Judge whether the part of speech is verb or adjective;If so, enter in next step.
It is 5. according to the method described in claim 1, it is characterized in that, described according to first semantic similarity and described second Semantic similarity, and determine the near synonym of the text, including:
According to first semantic similarity and second semantic similarity and default weight, obtain the text with it is described Third semantic similarity between candidate near synonym;
Candidate near synonym are chosen according to third selection rule and the third semantic similarity;
According to the candidate near synonym of selection, the near synonym are determined.
6. according to the method described in claim 5, it is characterized in that, the third semantic similarity is calculated using equation below:
Z=α * X+ (1- α) * Y
Wherein, it is second semantic similarity that X, which is first semantic similarity, Y, and α is the default weight, α 0-1 it Between, Z is the third semantic similarity.
7. a kind of near synonym excavating gear, which is characterized in that including:
Text acquiring unit, for obtaining pending text;
Default near synonym acquiring unit, for obtaining the default near synonym of the text;
Semantic similarity acquiring unit, for by the Documents Similarity algorithm based on term vector, obtaining the text and each time Select the first semantic similarity between near synonym;And by the Documents Similarity algorithm, obtain the default near synonym with The second semantic similarity between candidate's near synonym, candidate's near synonym are obtained from default vocabulary;
Near synonym determination unit, for according to first semantic similarity and second semantic similarity, determining the text This near synonym.
8. device according to claim 7, which is characterized in that the near synonym determination unit includes:
Candidate near synonym collection obtains subelement, for the described first semantic phase according to the first selection rule and each candidate near synonym Like degree ranking, candidate near synonym are chosen, form the first candidate near synonym collection;And according to the second selection rule and respectively The second semantic similarity ranking of candidate near synonym, chooses candidate near synonym, forms the second candidate near synonym collection;
Candidate near synonym obtain subelement, are total to for obtaining the described first candidate near synonym collection and the second candidate near synonym collection With the candidate near synonym included;
First near synonym determination subelement, for according to the candidate near synonym included jointly, determining the near synonym.
9. device according to claim 7, which is characterized in that the near synonym determination unit includes:
Third semantic similarity obtains subelement, for according to first semantic similarity and second semantic similarity, And default weight, obtain the third semantic similarity between the text and the candidate near synonym;
Candidate near synonym choose subelement, for according to third selection rule and the third semantic similarity to candidate near synonym It is chosen;
Third near synonym determination subelement for the candidate near synonym according to selection, determines the near synonym.
10. a kind of electronic equipment, which is characterized in that including:
Processor;And
Memory, for storing the program for realizing near synonym method for digging, which is powered and passes through the processor operation should After the program of near synonym method for digging, following step is performed:Obtain pending text;Obtain the default near synonym of the text; By the Documents Similarity algorithm based on term vector, the first semanteme obtained between the text and each candidate near synonym is similar Degree;And by the Documents Similarity algorithm, obtain the second language between the default near synonym and the candidate near synonym Adopted similarity, candidate's near synonym are obtained from default vocabulary;According to first semantic similarity and second semanteme Similarity determines the near synonym of the text.
CN201810023323.0A 2018-01-10 2018-01-10 Near synonym method for digging, device and electronic equipment Active CN108255810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810023323.0A CN108255810B (en) 2018-01-10 2018-01-10 Near synonym method for digging, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810023323.0A CN108255810B (en) 2018-01-10 2018-01-10 Near synonym method for digging, device and electronic equipment

Publications (2)

Publication Number Publication Date
CN108255810A true CN108255810A (en) 2018-07-06
CN108255810B CN108255810B (en) 2019-04-09

Family

ID=62724920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810023323.0A Active CN108255810B (en) 2018-01-10 2018-01-10 Near synonym method for digging, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN108255810B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427613A (en) * 2019-07-16 2019-11-08 深圳供电局有限公司 A kind of near synonym discovery method and its system, computer readable storage medium
CN117112736A (en) * 2023-10-24 2023-11-24 云南瀚文科技有限公司 Information retrieval analysis method and system based on semantic analysis model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012203472A (en) * 2011-03-23 2012-10-22 Toshiba Corp Document processor and program
CN105868236A (en) * 2015-12-09 2016-08-17 乐视网信息技术(北京)股份有限公司 Synonym data mining method and system
CN106126494A (en) * 2016-06-16 2016-11-16 上海智臻智能网络科技股份有限公司 Synonym finds method and device, data processing method and device
CN106547732A (en) * 2016-10-14 2017-03-29 深圳中兴网信科技有限公司 Near synonym recognition methodss and near synonym identifying system
CN106610935A (en) * 2016-07-18 2017-05-03 四川用联信息技术有限公司 Lexical semantic similarity solution algorithm based on new statistics
CN107092679A (en) * 2017-04-21 2017-08-25 北京邮电大学 A kind of feature term vector preparation method, file classification method and device
CN107451126A (en) * 2017-08-21 2017-12-08 广州多益网络股份有限公司 A kind of near synonym screening technique and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012203472A (en) * 2011-03-23 2012-10-22 Toshiba Corp Document processor and program
CN105868236A (en) * 2015-12-09 2016-08-17 乐视网信息技术(北京)股份有限公司 Synonym data mining method and system
CN106126494A (en) * 2016-06-16 2016-11-16 上海智臻智能网络科技股份有限公司 Synonym finds method and device, data processing method and device
CN106610935A (en) * 2016-07-18 2017-05-03 四川用联信息技术有限公司 Lexical semantic similarity solution algorithm based on new statistics
CN106547732A (en) * 2016-10-14 2017-03-29 深圳中兴网信科技有限公司 Near synonym recognition methodss and near synonym identifying system
CN107092679A (en) * 2017-04-21 2017-08-25 北京邮电大学 A kind of feature term vector preparation method, file classification method and device
CN107451126A (en) * 2017-08-21 2017-12-08 广州多益网络股份有限公司 A kind of near synonym screening technique and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427613A (en) * 2019-07-16 2019-11-08 深圳供电局有限公司 A kind of near synonym discovery method and its system, computer readable storage medium
CN110427613B (en) * 2019-07-16 2022-12-13 深圳供电局有限公司 Method and system for finding similar meaning words and computer readable storage medium
CN117112736A (en) * 2023-10-24 2023-11-24 云南瀚文科技有限公司 Information retrieval analysis method and system based on semantic analysis model
CN117112736B (en) * 2023-10-24 2024-01-05 云南瀚文科技有限公司 Information retrieval analysis method and system based on semantic analysis model

Also Published As

Publication number Publication date
CN108255810B (en) 2019-04-09

Similar Documents

Publication Publication Date Title
CN104239300B (en) The method and apparatus that semantic key words are excavated from text
CN105224682B (en) New word discovery method and device
CN104346418A (en) Anonymizing Sensitive Identifying Information Based on Relational Context Across a Group
CN112885478B (en) Medical document retrieval method, medical document retrieval device, electronic device and storage medium
CN109933708A (en) Information retrieval method, device, storage medium and computer equipment
CN111931488B (en) Method, device, electronic equipment and medium for verifying accuracy of judgment result
CN102567375B (en) Data mining method and device
CN108287875A (en) Personage's cooccurrence relation determines method, expert recommendation method, device and equipment
CN109117475B (en) Text rewriting method and related equipment
CN112989235B (en) Knowledge base-based inner link construction method, device, equipment and storage medium
CN109214417A (en) The method for digging and device, computer equipment and readable medium that user is intended to
CN109947934A (en) For the data digging method and system of short text
CN108255810B (en) Near synonym method for digging, device and electronic equipment
WO2023125315A1 (en) Information search method and apparatus, electronic device and storage medium
US7870082B2 (en) Method for machine learning using online convex optimization problem solving with minimum regret
CN108536702A (en) A kind of related entities determine method, apparatus and computing device
CN106991084B (en) Document evaluation method and device
CN104142948A (en) Method and equipment for mining domain review leader
CN107451212A (en) Synonymous method for digging and device based on relevant search
CN113590775B (en) Diagnosis and treatment data processing method and device, electronic equipment and storage medium
CN109522275A (en) Label method for digging, electronic equipment and the storage medium of content are produced based on user
CN114120452A (en) Living body detection model training method and device, electronic equipment and storage medium
US20190205359A1 (en) Evaluation of formulas via modal attributes
CN110489759A (en) Text feature weighting and short text similarity calculation method, system and medium based on word frequency
CN112860626B (en) Document ordering method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20180706

Assignee: Zhongke Dingfu (Beijing) Science and Technology Development Co., Ltd.

Assignor: Beijing Shenzhou Taiyue Software Co., Ltd.

Contract record no.: X2019990000215

Denomination of invention: Synonym mining method, device and electronic equipment

Granted publication date: 20190409

License type: Exclusive License

Record date: 20191127

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200629

Address after: 230000 zone B, 19th floor, building A1, 3333 Xiyou Road, hi tech Zone, Hefei City, Anhui Province

Patentee after: Dingfu Intelligent Technology Co., Ltd

Address before: 100089 Beijing city Haidian District wanquanzhuang Road No. 28 Wanliu new building block A Room 601

Patentee before: BEIJING ULTRAPOWER SOFTWARE Co.,Ltd.