CN109213916A - Method and apparatus for generating information - Google Patents

Method and apparatus for generating information Download PDF

Info

Publication number
CN109213916A
CN109213916A CN201811075006.XA CN201811075006A CN109213916A CN 109213916 A CN109213916 A CN 109213916A CN 201811075006 A CN201811075006 A CN 201811075006A CN 109213916 A CN109213916 A CN 109213916A
Authority
CN
China
Prior art keywords
word
search
words
target
subclass
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811075006.XA
Other languages
Chinese (zh)
Inventor
邓江东
李磊
马维英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201811075006.XA priority Critical patent/CN109213916A/en
Priority to PCT/CN2018/115951 priority patent/WO2020052059A1/en
Publication of CN109213916A publication Critical patent/CN109213916A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application discloses the method and apparatus for generating information.One specific embodiment of this method includes: to obtain target search set of words;For the target search word in target search set of words, determine whether preset set of words includes the corresponding word for having the corresponding relationship pre-established with the target search word;Include at least one corresponding word in response to determining, determines the similarity of the corresponding word and the target search word at least one corresponding word;According to the size order of similarity, destination number corresponding word is extracted from least one corresponding word as corresponding word set corresponding with the target search word;Based on obtained corresponding word set, at least one search set of words is generated.The embodiment helps to improve the comprehensive and specific aim of information search.

Description

Method and apparatus for generating information
Technical field
The invention relates to field of computer technology, and in particular to the method and apparatus for generating information.
Background technique
Currently, usually being scanned for using the search term of user's input when user searches for information in a network, search knot It include the word same or similar with search term in fruit.The accuracy of search result depends on the search term and net for inputting user Matching rule when information in network is matched.
Summary of the invention
The embodiment of the present application proposes the method and apparatus for generating information.
In a first aspect, the embodiment of the present application provides a kind of method for generating information, this method comprises: obtaining target Search for set of words;For the target search word in target search set of words, determine preset set of words whether include and the mesh Mark search term has the corresponding word of the corresponding relationship pre-established;Include at least one corresponding word in response to determining, determines The similarity of corresponding word and the target search word at least one corresponding word;According to the size order of similarity, to Destination number corresponding word is extracted in a few corresponding word as corresponding word set corresponding with the target search word;Base In obtained corresponding word set, at least one search set of words is generated.
In some embodiments, it is based on obtained corresponding word set, it, should after generating at least one search set of words Method further include: for the search set of words at least one search set of words, the search term for including using the search set of words Information search is carried out, search result and output are obtained.
In some embodiments, target search set of words is the word for obtain after word cutting to the search statement of user's input The set of language.
In some embodiments, preset set of words includes at least one subclass;And determine preset word collection Whether includes the corresponding word that with the target search word has the corresponding relationship that pre-establishes, comprising: determine at least one son if closing With the presence or absence of the subclass including the target search word in set;Exist in response to determining, determines to include the target search word Other words in subclass, in addition to the target search word are corresponding word corresponding with the target search word.
In some embodiments, at least one subclass that preset set of words includes obtains in accordance with the following steps in advance It arrives: obtaining target text set;Word cutting is carried out to the target text in target text set, obtains set of words;After word cutting Word in obtained set of words carries out near synonym cluster, obtains at least one subclass, wherein at least one subset Subclass in conjunction, the similarity of the word which includes between any two are more than or equal to preset similarity threshold.
In some embodiments, at least one subclass that preset set of words includes obtains in accordance with the following steps in advance It arrives: obtaining initial search set of words;For the initial search word in initial search set of words, which is inputted default Search engine, obtain at least one search result;From at least one search result, extracting, there is the word of setting feature to make For target word;Based on extracted target word and the initial search word, the subclass that set of words includes is generated.
Second aspect, the embodiment of the present application provide it is a kind of for generating the device of information, the device include: obtain it is single Member is configured to obtain target search set of words;Extraction unit is configured to for the target search in target search set of words Word determines whether preset set of words includes the corresponding word for having the corresponding relationship pre-established with the target search word; Include at least one corresponding word in response to determining, determines corresponding word and the target search word at least one corresponding word Similarity;According to the size order of similarity, the conduct of destination number corresponding word is extracted from least one corresponding word Corresponding word set corresponding with the target search word;Generation unit is configured to based on obtained corresponding word set, raw Set of words is searched at least one.
In some embodiments, device further include: search unit is configured at least one search set of words Search set of words, the search term for including using the search set of words carries out information search, obtains search result and output.
In some embodiments, target search set of words is the word for obtain after word cutting to the search statement of user's input The set of language.
In some embodiments, preset set of words includes at least one subclass;And extraction unit includes: first Determining module is configured to determine at least one subclass with the presence or absence of the subclass including the target search word;Second really Cover half block, be configured in response to determine presence, determine include the target search word subclass in, remove the target search word Other words in addition are corresponding word corresponding with the target search word.
In some embodiments, at least one subclass that preset set of words includes obtains in accordance with the following steps in advance It arrives: obtaining target text set;Word cutting is carried out to the target text in target text set, obtains set of words;After word cutting Word in obtained set of words carries out near synonym cluster, obtains at least one subclass, wherein at least one subset Subclass in conjunction, the similarity of the word which includes between any two are more than or equal to preset similarity threshold.
In some embodiments, at least one subclass that preset set of words includes obtains in accordance with the following steps in advance It arrives: obtaining initial search set of words;For the initial search word in initial search set of words, which is inputted default Search engine, obtain at least one search result;From at least one search result, extracting, there is the word of setting feature to make For target word;Based on extracted target word and the initial search word, the subclass that set of words includes is generated.
The third aspect, the embodiment of the present application provide a kind of server, which includes: one or more processors; Storage device is stored thereon with one or more programs;When one or more programs are executed by one or more processors, so that One or more processors realize the method as described in implementation any in first aspect.
Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, should The method as described in implementation any in first aspect is realized when computer program is executed by processor.
Method and apparatus provided by the embodiments of the present application for generating information, by obtaining target search set of words, so Afterwards for the target search word in target search set of words, determination is corresponding with the target search word from preset set of words At least one corresponding word, and according to the size order of corresponding word and the similarity of the target search word, from least one Destination number corresponding word is extracted in corresponding word as corresponding word set corresponding with the target search word, is finally based on Obtained each corresponding word set generates at least one search set of words, so as to according to target search set of words, life At more search set of words, the comprehensive and specific aim of information search is helped to improve.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that one embodiment of the application can be applied to exemplary system architecture figure therein;
Fig. 2 is the flow chart according to one embodiment of the method for generating information of the embodiment of the present application;
Fig. 3 is the schematic diagram according to an application scenarios of the method for generating information of the embodiment of the present application;
Fig. 4 is the flow chart according to another embodiment of the method for generating information of the embodiment of the present application;
Fig. 5 is the structural schematic diagram according to one embodiment of the device for generating information of the embodiment of the present application;
Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can the method for generating information using the embodiment of the present application or the device for generating information Exemplary system architecture 100.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications, such as searching class application, net can be installed on terminal device 101,102,103 The application of page browsing device, shopping class application etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, it can be various electronic equipments, including but not limited to smart phone, tablet computer, E-book reader, MP3 player (Moving Picture ExpertsGroup Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) player, pocket computer on knee and desktop computer etc..It, can be with when terminal device 101,102,103 is software It is mounted in above-mentioned cited electronic equipment.Multiple softwares or software module may be implemented into (such as providing distribution in it The software or software module of formula service), single software or software module also may be implemented into.It is not specifically limited herein.
Server 105 can be to provide the server of various services, such as to the mesh that terminal device 101,102,103 is sent The background information processing server that mark search set of words is handled.Background information processing server can search the target of acquisition Rope set of words is handled, and generates processing result (for example, at least one search set of words).
It should be noted that the method provided by the embodiment of the present application for generating information is generally held by server 105 Row, correspondingly, the device for generating information is generally positioned in server 105.
It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software To be implemented as multiple softwares or software module (such as providing the software of Distributed Services or software module), also may be implemented At single software or software module.It is not specifically limited herein.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the process of one embodiment of the method for generating information according to the application is shown 200.The method for being used to generate information, comprising the following steps:
Step 201, target search set of words is obtained.
In the present embodiment, can lead to for generating the executing subject (such as server shown in FIG. 1) of the method for information Wired connection mode or radio connection are crossed from long-range or from local obtain target search set of words.Wherein, target search Set of words can be the set of its word for carrying out information search to be utilized.For example, target search set of words, which can be user, to be made The word for being used to search for information being inputted with terminal device (such as terminal device shown in FIG. 1), being sent to above-mentioned executing subject The set of language.
In some optional implementations of the present embodiment, target search set of words can be the search to user's input Sentence carries out the set of the word obtained after word cutting.Specifically, terminal device input search as shown in Figure 1 can be used in user Sentence, and above-mentioned executing subject is sent by search statement, above-mentioned executing subject can carry out word cutting to sentence according to existing Method (such as maximum forward matching method, N-gram model method, Hidden Markov Model method etc.), to search statement carry out Word cutting obtains target search set of words.
Step 202, for the target search word in target search set of words, determine preset set of words whether include with The target search word has the corresponding word of the corresponding relationship pre-established;It include at least one corresponding word in response to determining, Determine the similarity of the corresponding word and the target search word at least one corresponding word;According to the size order of similarity, Destination number corresponding word is extracted from least one corresponding word as corresponding word collection corresponding with the target search word It closes.
In the present embodiment, for the target search word in target search set of words, above-mentioned executing subject can be executed such as Lower step:
Step 2021, determine whether preset set of words includes having the corresponding pass pre-established with the target search word The corresponding word of system.
Specifically, above-mentioned set of words can be set in advance in above-mentioned executing subject, can also be set in advance in it is upper In other electronic equipments for stating executing subject communication connection.The corresponding relationship of target search word and corresponding word can first pass through in advance Various modes are established.For example, target search word and corresponding word can be characterized by way of the two-dimensional table pre-established Corresponding relationship, target search word and equivalent can also be pre-established to forms such as, chained lists by key (key) value (value) The corresponding relationship of language.
In some optional implementations of the present embodiment, above-mentioned preset set of words includes at least one subset It closes.Above-mentioned executing subject can determine whether set of words includes having to pre-establish with the target search word in accordance with the following steps Corresponding relationship corresponding word:
Firstly, with the presence or absence of including the target search word at least one subclass that determining preset set of words includes Subclass.Then, in response to determine exist, determine include the target search word subclass in, remove the target search word Other words in addition are corresponding word corresponding with the target search word.
In general, the similarity of word between any two in subclass can be more than or equal to preset similarity threshold, i.e., it is sub Word in set is synonym or near synonym.As an example it is supposed that target search word is " child ", certain subclass includes as follows Word: children, child is juvenile, child.Then the word " children " in the subclass, " teenager ", " child " are and target search word The corresponding word of " child ".It should be noted that the similarity of word between any two in subclass can be above-mentioned execution master Body or other electronic equipments advance with what the existing various algorithms for calculating the similarity between word were calculated.
In some optional implementations of the present embodiment, at least one subset that above-mentioned preset set of words includes Conjunction can be obtained in accordance with the following steps in advance by above-mentioned executing subject or other electronic equipments:
Firstly, obtaining target text set.Wherein, target text can be the text to carry out word cutting to it.Target text This set can store in above-mentioned executing subject, also can store and sets in other electronics communicated to connect with above-mentioned executing subject In standby.It should be noted that the target text in target text set can be stored separately in an electronic equipment, target text Target text in this set also can store in the electronic equipment cluster being made of multiple electronic equipment.
Then, word cutting is carried out to the target text in target text set, obtains set of words.Specifically, above-mentioned execution Main body or other electronic equipments can be according to the existing various methods for carrying out word cutting to sentence, each item for including to target text Sentence carries out word cutting, obtains set of words.
Finally, carrying out near synonym cluster to the word in the set of words obtained after word cutting, at least one subclass is obtained. Wherein, for the subclass at least one subclass, the similarity of the word which includes between any two is more than or equal to Preset similarity threshold.Specifically, above-mentioned executing subject or other electronic equipments can use existing near synonym cluster and calculate Method carries out near synonym cluster to the word in set of words, obtains at least one subclass.As an example, above-mentioned executing subject Or other electronic equipments can use existing term vector model (such as word2vec, sense2vec etc.), obtain set of words In each word term vector.The algorithm that is clustered to term vector is recycled, each term vector is clustered (such as k- Means algorithm, decision Tree algorithms etc.), the similarity of the term vector for including in each cluster therein between any two is (such as similar Degree can Euclidean distance, COS distance etc. for vector between) more than or equal to preset similarity threshold.To every by what is obtained The set of the word for the term vector characterization for including in a cluster is determined as the subclass that set of words includes.
In some optional implementations of the present embodiment, at least one subclass that preset set of words includes is also It can be obtained in accordance with the following steps in advance by above-mentioned executing subject or other electronic equipments:
Firstly, obtaining initial search set of words.Wherein, the search term that initial search word can be pre-entered with technical staff.
Then, for the initial search word in initial search set of words, following each sub-step is executed:
The initial search word is inputted preset search engine, obtains at least one search result by sub-step one.Wherein, Above-mentioned search engine (Search Engine) refer to according to certain strategy, with specific computer program from internet Information is collected, after carrying out tissue and processing to information, provides retrieval service for user, the relevant information of user search is shown To the system of user.Above-mentioned preset search engine can be existing various search engines.Above-mentioned at least one search result In search result may include text, text may include the contents such as topic, text.
Sub-step two, from least one search result, extracting has the word of setting feature as target word.Specifically Ground, above-mentioned setting feature can be the various features that word has, and including but not limited to following at least one: font color is pre- If the highlighted background color of color, text is pre-set color etc..As an example, the text in search result may include red The text of color font, the text of red font are usually the word same or similar with the meaning of a word of search term.Above-mentioned executing subject Or other electronic equipments can identify the features such as the text color in search result, to extract the word conduct of setting feature Target word.
Sub-step three generates the subclass that set of words includes based on extracted target word and the initial search word. As an example, the quantity of the target word extracted can be at least one, above-mentioned executing subject or other electronic equipments can be incited somebody to action The initial search word and extracted target word form subclass.As another example, above-mentioned executing subject or other electronics Equipment can calculate the similarity of the initial search word Yu each target word, the phase for the preset similarity threshold that will be greater than or equal to Subclass is synthesized with the initial search phrase like corresponding target word is spent.
Step 2022, in response to determining to include at least one corresponding word, the correspondence at least one corresponding word is determined The similarity of word and the target search word.
Specifically, above-mentioned executing subject can according to it is existing it is various calculate words between similarity algorithm (such as Editing distance (Levenshtein Distance) algorithm is based on vector space model (Vector Space Model, VSM) COS distance algorithm etc.), determine the similarity of the corresponding word and the target search word at least one corresponding word.
Step 2023, according to the size order of similarity, it is corresponding that destination number is extracted from least one corresponding word Word is as corresponding word set corresponding with the target search word.
Specifically, the sequence that above-mentioned executing subject can be descending according to similarity, from least one corresponding word Destination number corresponding word is extracted as corresponding word set corresponding with the target search word.Wherein, destination number can be with It is pre-set quantity, is also possible to according to the equivalent for including at least one corresponding corresponding word of the target search word The quantity that the quantity of language determines.For example, being preset when the quantity for the corresponding word for including at least one corresponding word is more than or equal to When quantity, destination number is preset quantity;Otherwise, destination number is the corresponding word for including at least one corresponding word Quantity.
In practice, corresponding word corresponding with target search word can be the synonym or near synonym of target search word.It presses It can contribute to pointedly generate final search set of words according to the corresponding word set that similarity size order extracts.
Step 203, it is based on obtained corresponding word set, generates at least one search set of words.
In the present embodiment, above-mentioned executing subject can be based on obtained corresponding word set, generate at least one and search Rope set of words.As an example, above-mentioned executing subject can be by obtained each corresponding word set and target search set of words Group is combined into search term set.
As another example, above-mentioned executing subject can be from obtained each corresponding word set, according to equivalent Language extracts a corresponding word with the size order of the similarity of corresponding target search word respectively, or randomly extracts respectively One corresponding word, by extracted each corresponding word with it is in target search set of words, do not have corresponding corresponding word Target search phrase be combined into search term set.
With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for generating information of the present embodiment Figure.In the application scenarios of Fig. 3, server 301 gets the target search word set that user utilizes terminal device 302 to input first Close 303 (for example including target search words " boy ", " diaper ").Then, server 301 is from preset set of words 304, Determine that at least one corresponding corresponding word 3041 of target search word " boy " includes: " children ", " baby ", " teenager ";It determines At least one corresponding corresponding word 3042 of target search word " diaper " includes: " paper diaper ", " urine mustard seed ".Subsequently, it takes Device 301 be engaged in from least one corresponding corresponding word of each target search word, according to corresponding word and corresponding target search The descending sequence of the similarity of word extracts two corresponding words as the corresponding corresponding word set of search term.Wherein, mesh The corresponding corresponding word set 305 of mark search term " boy " includes: " children ", " baby ", and target search word " diaper " is corresponding Corresponding word set 306 include: " paper diaper ", " urine mustard seed ".Then, server 301 is based on obtained corresponding word collection Close 305 and 306, generate two search set of words 307 (for example including " children ", " paper diaper "), 308 (such as " baby ", " urinate Mustard seed ").
The method provided by the above embodiment of the application, by obtaining target search set of words, then for target search Target search word in set of words determines at least one equivalent corresponding with the target search word from preset set of words Language, and according to the size order of corresponding word and the similarity of the target search word, it is extracted from least one corresponding word Destination number corresponding word is as corresponding word set corresponding with the target search word, finally based on obtained each right Set of words is answered, at least one search set of words is generated, so as to generate more search terms according to target search set of words Set, helps to improve the comprehensive and specific aim of information search.
With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for generating information.The use In the process 400 for the method for generating information, comprising the following steps:
Step 401, target search set of words is obtained.
In the present embodiment, step 401 and the step 201 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.
Step 402, for the target search word in target search set of words, determine preset set of words whether include with The target search word has the corresponding word of the corresponding relationship pre-established;It include at least one corresponding word in response to determining, Determine the similarity of the corresponding word and the target search word at least one corresponding word;According to the size order of similarity, Destination number corresponding word is extracted from least one corresponding word as corresponding word collection corresponding with the target search word It closes.
In the present embodiment, step 402 and the step 202 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.
Step 403, it is based on obtained corresponding word set, generates at least one search set of words.
In the present embodiment, step 403 and the step 203 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.
Step 404, it for the search set of words at least one search set of words, is searched using what the search set of words included Rope word carries out information search, obtains search result and output.
In the present embodiment, it for the search set of words at least one search set of words obtained in step 403, is used for The executing subject (such as server shown in FIG. 1) for generating the method for information can use the search term that the search set of words includes Information search is carried out, search result and output are obtained.
Specifically, above-mentioned executing subject can will search for the search term in set of words and input in preset search engine, obtain To search result.Alternatively, above-mentioned executing subject can use the search term in search set of words, in preset information aggregate (example Such as the set for the information that certain website includes) in carry out information search, obtain search result.Wherein, search result may include but It is not limited to following at least one: picture, text, link etc..Search result can export in various ways, such as can will search for As the result is shown on the display being connect with above-mentioned executing subject, or search result can be sent to and above-mentioned executing subject On the terminal device of communication connection.
It should be noted that the quantity of the corresponding search result of search set of words can be the pre-set number of technical staff Amount, can make search result more have specific aim in this way.
Figure 4, it is seen that the method for generating information compared with the corresponding embodiment of Fig. 2, in the present embodiment Process 400 highlight using generation at least one search set of words scan for and export search result the step of.As a result, The scheme of the present embodiment description can use the search set of words of generation, obtains more comprehensively and more targeted search is tied Fruit.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for generating letter One embodiment of the device of breath, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.
As shown in figure 5, the present embodiment includes: acquiring unit 501 for generating the device 500 of information, it is configured to obtain Take target search set of words;Extraction unit 502 is configured to determine pre- target search word in target search set of words If set of words whether include that there is the corresponding word of corresponding relationship pre-established with the target search word;In response to determination Including at least one corresponding word, the similarity of the corresponding word and the target search word at least one corresponding word is determined; According to the size order of similarity, destination number corresponding word is extracted from least one corresponding word and is searched as with the target The corresponding corresponding word set of rope word;Generation unit 503 is configured to generate at least based on obtained corresponding word set One search set of words.
In the present embodiment, acquiring unit 501 can by wired connection mode or radio connection from long-range or Target search set of words is obtained from local.Wherein, target search set of words can be its word for carrying out information search to be utilized Set.For example, target search set of words can be user's using terminal equipment (such as terminal device shown in FIG. 1) input , be sent to above-mentioned apparatus 500 for search for information word set.
In the present embodiment, for the target search word in target search set of words, said extracted unit 502 can be executed Following steps:
Step 5021, determine whether preset set of words includes having the corresponding pass pre-established with the target search word The corresponding word of system.
Specifically, above-mentioned set of words can be set in advance in above-mentioned apparatus 500, can also be set in advance in it is above-mentioned In other electronic equipments that device 500 communicates to connect.The corresponding relationship of target search word and corresponding word can first pass through respectively in advance Kind mode is established.For example, target search word and corresponding word can be characterized by way of the two-dimensional table pre-established Corresponding relationship can also pre-establish target search word and corresponding word to forms such as, chained lists by key (key) value (value) Corresponding relationship corresponding relationship.
Step 5022, in response to determining to include at least one corresponding word, the correspondence at least one corresponding word is determined The similarity of word and the target search word.
Specifically, said extracted unit 502 can be according to the existing various algorithm (examples for calculating the similarity between word Such as editing distance (Levenshtein Distance) algorithm is based on vector space model (Vector Space Model, VSM) COS distance algorithm etc.), determine the similarity of the corresponding word and the target search word at least one corresponding word.
Step 5023, according to the size order of similarity, it is corresponding that destination number is extracted from least one corresponding word Word is as corresponding word set corresponding with the target search word.
Specifically, the sequence that above-mentioned executing subject can be descending according to similarity, from least one corresponding word Destination number corresponding word is extracted as corresponding word set corresponding with the target search word.Wherein, destination number can be with It is pre-set quantity, is also possible to according to the equivalent for including at least one corresponding corresponding word of the target search word The quantity that the quantity of language determines.For example, being preset when the quantity for the corresponding word for including at least one corresponding word is more than or equal to When quantity, destination number is preset quantity;Otherwise, destination number is the corresponding word for including at least one corresponding word Quantity.
In practice, corresponding word corresponding with target search word can be the synonym or near synonym of target search word.It presses It can contribute to pointedly generate final search set of words according to the corresponding word set that similarity size order extracts.
In the present embodiment, generation unit 503 can be based on obtained corresponding word set, generate at least one search Set of words.As an example, above-mentioned generation unit 503 can be by obtained each corresponding word set and target search set of words Group is combined into search term set.
As another example, above-mentioned generation unit 503 can be from obtained each corresponding word set, according to correspondence The size order of word and the similarity of corresponding target search word, or a corresponding word is randomly extracted respectively, by institute The each corresponding word extracted is combined into target search phrase in target search set of words, without corresponding corresponding word Search for set of words.
In some optional implementations of the present embodiment, the device 500 can also include: search unit (in figure not Show), it is configured to the search for including using the search set of words for the search set of words at least one search set of words Word carries out information search, obtains search result and output.
In some optional implementations of the present embodiment, target search set of words is the search statement to user's input Carry out the set of the word obtained after word cutting.
In some optional implementations of the present embodiment, preset set of words includes at least one subclass;With And extraction unit 502 may include: the first determining module (not shown), being configured to determine at least one subclass is It is no to there is the subclass including the target search word;Second determining module (not shown) is configured in response to determination and deposits Determining that other words in the subclass including the target search word, in addition to the target search word are to search with the target The corresponding corresponding word of rope word.
In some optional implementations of the present embodiment, at least one subclass that preset set of words includes can To obtain in accordance with the following steps in advance: obtaining target text set;Word cutting is carried out to the target text in target text set, is obtained To set of words;Near synonym cluster is carried out to the word in the set of words obtained after word cutting, obtains at least one subclass, In, for the subclass at least one subclass, the similarity of the word which includes between any two is more than or equal to pre- If similarity threshold.
In some optional implementations of the present embodiment, at least one subclass that preset set of words includes can To obtain in accordance with the following steps in advance: obtaining initial search set of words;It, will for the initial search word in initial search set of words The initial search word inputs preset search engine, obtains at least one search result;From at least one search result, extract Word with setting feature is as target word;Based on extracted target word and the initial search word, word collection is generated The subclass that conjunction includes.
The device provided by the above embodiment of the application, by obtaining target search set of words, then for target search Target search word in set of words determines at least one equivalent corresponding with the target search word from preset set of words Language, and according to the size order of corresponding word and the similarity of the target search word, it is extracted from least one corresponding word Destination number corresponding word is as corresponding word set corresponding with the target search word, finally based on obtained each right Set of words is answered, at least one search set of words is generated, so as to generate more search terms according to target search set of words Set, helps to improve the comprehensive and specific aim of information search.
Below with reference to Fig. 6, it illustrates the computer systems 600 for the server for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Server shown in Fig. 6 is only an example, should not function and use scope band to the embodiment of the present application Carry out any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.
I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.;Including such as liquid crystal Show the output par, c 607 of device (LCD) etc. and loudspeaker etc.;Storage section 608 including hard disk etc.;And including such as LAN The communications portion 609 of the network interface card of card, modem etc..Communications portion 609 is executed via the network of such as internet Communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as disk, CD, magneto-optic Disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to from the computer program root read thereon According to needing to be mounted into storage section 608.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media 611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes Above-mentioned function.
It should be noted that computer-readable medium described herein can be computer-readable signal media or meter Calculation machine readable medium either the two any combination.Computer-readable medium for example may be-but not limited to- Electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.It is computer-readable The more specific example of medium can include but is not limited to: have electrical connection, the portable computer magnetic of one or more conducting wires Disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or sudden strain of a muscle Deposit), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned appoint The suitable combination of meaning.In this application, computer-readable medium can be any tangible medium for including or store program, the journey Sequence can be commanded execution system, device or device use or in connection.And in this application, it is computer-readable Signal media may include in a base band or as carrier wave a part propagate data-signal, wherein carrying computer can The program code of reading.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal, optical signal or Above-mentioned any appropriate combination.Computer-readable signal media can also be any calculating other than computer-readable medium Machine readable medium, the computer-readable medium can be sent, propagated or transmitted for by instruction execution system, device or device Part uses or program in connection.The program code for including on computer-readable medium can use any Jie appropriate Matter transmission, including but not limited to: wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The calculating of the operation for executing the application can be write with one or more programming languages or combinations thereof Machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package, Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN) Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service Provider is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include acquiring unit, extraction unit, generation unit 503.Wherein, the title of these units is not constituted under certain conditions to the list The restriction of member itself, for example, acquiring unit is also described as " obtaining the unit of target search set of words ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in server described in above-described embodiment;It is also possible to individualism, and without in the supplying server.It is above-mentioned Computer-readable medium carries one or more program, when said one or multiple programs are executed by the server, So that the server: obtaining target search set of words;For the target search word in target search set of words, preset word is determined Whether language set includes the corresponding word for having the corresponding relationship pre-established with the target search word;It include extremely in response to determining A few corresponding word, determines the similarity of the corresponding word and the target search word at least one corresponding word;According to phase Like the size order of degree, extracted from least one corresponding word destination number corresponding word as with the target search word pair The corresponding word set answered;Based on obtained corresponding word set, at least one search set of words is generated.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (14)

1. a kind of method for generating information, comprising:
Obtain target search set of words;
For the target search word in the target search set of words, determine whether preset set of words includes searching with the target Rope word has the corresponding word of the corresponding relationship pre-established;Include at least one corresponding word in response to determining, determines at least The similarity of corresponding word and the target search word in one corresponding word;According to the size order of similarity, from least one Destination number corresponding word is extracted in a corresponding word as corresponding word set corresponding with the target search word;
Based on obtained corresponding word set, at least one search set of words is generated.
2. according to the method described in claim 1, wherein, being based on obtained corresponding word set described, generating at least one After a search set of words, the method also includes:
For the search set of words at least one described search set of words, carried out using the search term that the search set of words includes Information search obtains search result and output.
3. according to the method described in claim 1, wherein, the target search set of words be to the search statement of user's input into The set of the word obtained after row word cutting.
4. method described in one of -3 according to claim 1, wherein the preset set of words includes at least one subset It closes;And
Whether the preset set of words of determination includes having the corresponding of the corresponding relationship pre-established with the target search word Word, comprising:
It determines at least one described subclass with the presence or absence of the subclass including the target search word;
Exist in response to determining, determines to include in the subclass of the target search word, other in addition to the target search word Word is corresponding word corresponding with the target search word.
5. according to the method described in claim 4, wherein, at least one subclass that the preset set of words includes is preparatory It obtains in accordance with the following steps:
Obtain target text set;
Word cutting is carried out to the target text in the target text set, obtains set of words;
Near synonym cluster is carried out to the word in the set of words obtained after word cutting, obtains at least one subclass, wherein for Subclass at least one described subclass, the similarity of the word which includes between any two are more than or equal to preset Similarity threshold.
6. according to the method described in claim 4, wherein, at least one subclass that the preset set of words includes is preparatory It obtains in accordance with the following steps:
Obtain initial search set of words;
For the initial search word in the initial search set of words, which is inputted into preset search engine, is obtained To at least one search result;From at least one search result, extracting has the word of setting feature as target word Language;Based on extracted target word and the initial search word, the subclass that the set of words includes is generated.
7. a kind of for generating the device of information, comprising:
Acquiring unit is configured to obtain target search set of words;
Extraction unit is configured to determine preset set of words for the target search word in the target search set of words It whether include the corresponding word that there is the corresponding relationship pre-established with the target search word;It include at least one in response to determining Corresponding word determines the similarity of the corresponding word and the target search word at least one corresponding word;According to similarity Size order extracts destination number corresponding word as corresponding right with the target search word from least one corresponding word Answer set of words;
Generation unit is configured to generate at least one search set of words based on obtained corresponding word set.
8. device according to claim 7, wherein described device further include:
Search unit is configured to utilize the search word set for the search set of words at least one described search set of words The search term that conjunction includes carries out information search, obtains search result and output.
9. device according to claim 7, wherein the target search set of words be to user input search statement into The set of the word obtained after row word cutting.
10. the device according to one of claim 7-9, wherein the preset set of words includes at least one subset It closes;And
The extraction unit includes:
First determining module is configured to determine at least one described subclass with the presence or absence of the son including the target search word Set;
Second determining module, be configured in response to determine presence, determine include the target search word subclass in, except should Other words other than target search word are corresponding word corresponding with the target search word.
11. device according to claim 10, wherein at least one subclass that the preset set of words includes is pre- First obtain in accordance with the following steps:
Obtain target text set;
Word cutting is carried out to the target text in the target text set, obtains set of words;
Near synonym cluster is carried out to the word in the set of words obtained after word cutting, obtains at least one subclass, wherein for Subclass at least one described subclass, the similarity of the word which includes between any two are more than or equal to preset Similarity threshold.
12. device according to claim 10, wherein at least one subclass that the preset set of words includes is pre- First obtain in accordance with the following steps:
Obtain initial search set of words;
For the initial search word in the initial search set of words, which is inputted into preset search engine, is obtained To at least one search result;From at least one search result, extracting has the word of setting feature as target word Language;Based on extracted target word and the initial search word, the subclass that the set of words includes is generated.
13. a kind of server, comprising:
One or more processors;
Storage device is stored thereon with one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 6.
14. a kind of computer-readable medium, is stored thereon with computer program, wherein the realization when program is executed by processor Such as method as claimed in any one of claims 1 to 6.
CN201811075006.XA 2018-09-14 2018-09-14 Method and apparatus for generating information Pending CN109213916A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811075006.XA CN109213916A (en) 2018-09-14 2018-09-14 Method and apparatus for generating information
PCT/CN2018/115951 WO2020052059A1 (en) 2018-09-14 2018-11-16 Method and apparatus for generating information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811075006.XA CN109213916A (en) 2018-09-14 2018-09-14 Method and apparatus for generating information

Publications (1)

Publication Number Publication Date
CN109213916A true CN109213916A (en) 2019-01-15

Family

ID=64984182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811075006.XA Pending CN109213916A (en) 2018-09-14 2018-09-14 Method and apparatus for generating information

Country Status (2)

Country Link
CN (1) CN109213916A (en)
WO (1) WO2020052059A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688837A (en) * 2019-09-27 2020-01-14 北京百度网讯科技有限公司 Data processing method and device
CN111078849A (en) * 2019-12-02 2020-04-28 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106547732A (en) * 2016-10-14 2017-03-29 深圳中兴网信科技有限公司 Near synonym recognition methodss and near synonym identifying system
CN107451126A (en) * 2017-08-21 2017-12-08 广州多益网络股份有限公司 A kind of near synonym screening technique and system
CN107766498A (en) * 2017-10-19 2018-03-06 北京百度网讯科技有限公司 Method and apparatus for generating information
US20180101606A1 (en) * 2016-10-07 2018-04-12 Abel Torres Montoya Method and system for searching for relevant items in a collection of documents given user defined documents
CN108509474A (en) * 2017-09-15 2018-09-07 腾讯科技(深圳)有限公司 Search for the synonym extended method and device of information

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855252B (en) * 2011-06-30 2015-09-09 北京百度网讯科技有限公司 A kind of need-based data retrieval method and device
CN103455507B (en) * 2012-05-31 2017-03-29 国际商业机器公司 Search engine recommends method and device
CN107544982B (en) * 2016-06-24 2022-12-02 中兴通讯股份有限公司 Text information processing method and device and terminal
CN108491387B (en) * 2018-03-20 2022-04-22 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180101606A1 (en) * 2016-10-07 2018-04-12 Abel Torres Montoya Method and system for searching for relevant items in a collection of documents given user defined documents
CN106547732A (en) * 2016-10-14 2017-03-29 深圳中兴网信科技有限公司 Near synonym recognition methodss and near synonym identifying system
CN107451126A (en) * 2017-08-21 2017-12-08 广州多益网络股份有限公司 A kind of near synonym screening technique and system
CN108509474A (en) * 2017-09-15 2018-09-07 腾讯科技(深圳)有限公司 Search for the synonym extended method and device of information
CN107766498A (en) * 2017-10-19 2018-03-06 北京百度网讯科技有限公司 Method and apparatus for generating information

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688837A (en) * 2019-09-27 2020-01-14 北京百度网讯科技有限公司 Data processing method and device
CN110688837B (en) * 2019-09-27 2023-10-31 北京百度网讯科技有限公司 Data processing method and device
CN111078849A (en) * 2019-12-02 2020-04-28 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information

Also Published As

Publication number Publication date
WO2020052059A1 (en) 2020-03-19

Similar Documents

Publication Publication Date Title
CN106960030B (en) Information pushing method and device based on artificial intelligence
CN107491534A (en) Information processing method and device
CN109189938A (en) Method and apparatus for updating knowledge mapping
CN109522483A (en) Method and apparatus for pushed information
CN108121800A (en) Information generating method and device based on artificial intelligence
CN110096655A (en) Sort method, device, equipment and the storage medium of search result
CN109635094A (en) Method and apparatus for generating answer
US20150309988A1 (en) Evaluating Crowd Sourced Information Using Crowd Sourced Metadata
CN106919711B (en) Method and device for labeling information based on artificial intelligence
CN106354856B (en) Artificial intelligence-based deep neural network enhanced search method and device
CN110263142A (en) Method and apparatus for output information
CN109325121A (en) Method and apparatus for determining the keyword of text
CN110096584A (en) A kind of answer method and device
CN109271556A (en) Method and apparatus for output information
CN109858045A (en) Machine translation method and device
CN110084658A (en) The matched method and apparatus of article
CN108900612A (en) Method and apparatus for pushed information
CN110019948A (en) Method and apparatus for output information
CN109582825A (en) Method and apparatus for generating information
CN109558593A (en) Method and apparatus for handling text
CN109743245A (en) The method and apparatus for creating group
CN109862100A (en) Method and apparatus for pushed information
CN109785072A (en) Method and apparatus for generating information
CN109543068A (en) Method and apparatus for generating the comment information of video
CN109255036A (en) Method and apparatus for output information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190115

RJ01 Rejection of invention patent application after publication