WO2020052061A1 - 用于处理信息的方法和装置 - Google Patents

用于处理信息的方法和装置 Download PDF

Info

Publication number
WO2020052061A1
WO2020052061A1 PCT/CN2018/115954 CN2018115954W WO2020052061A1 WO 2020052061 A1 WO2020052061 A1 WO 2020052061A1 CN 2018115954 W CN2018115954 W CN 2018115954W WO 2020052061 A1 WO2020052061 A1 WO 2020052061A1
Authority
WO
WIPO (PCT)
Prior art keywords
candidate
word
user
title text
prompt
Prior art date
Application number
PCT/CN2018/115954
Other languages
English (en)
French (fr)
Inventor
邓江东
李磊
马维英
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020052061A1 publication Critical patent/WO2020052061A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • Embodiments of the present application relate to the field of computer technology, and in particular, to a method and an apparatus for processing information.
  • search words may be words, phrases, or sentences.
  • the embodiments of the present application provide a method and an apparatus for processing information.
  • an embodiment of the present application provides a method for processing information.
  • the method includes: obtaining a target title text set, where the target title text corresponds to body information, and the target title text is used by a user to input a search term and click, To present to the user the text information corresponding to the clicked target title text; for the target title text in the target title text set, based on the target title text, generating a candidate prompt word for prompting the user to search; from the generated candidate prompt Among the words, select a target prompt word to present to the user.
  • generating a candidate prompt word for prompting the user to search based on the target title text includes: entering the target title text into a pre-trained prompt word generation model to generate a result prompt; and based on the generated result prompt Words to generate candidate prompt words for prompting the user to search.
  • generating a candidate prompt for prompting the user to search based on the generated result prompt including: obtaining a historical search term corresponding to the target title text within a preset historical time period; for the obtained history
  • the historical search term in the search term determines the similarity between the historical search term and the generated result suggestion word, where the similarity is a value used to characterize the similarity between the historical search term and the result suggestion word; extract the similarity Historical search words greater than or equal to a preset threshold are used as candidate prompt words for prompting the user to search.
  • generating a candidate prompt word for prompting a user to search based on the target title text includes: segmenting the target title text to obtain a segmentation result; and generating a prompt prompting the user to search based on the obtained segmentation result.
  • Candidate hint words include: segmenting the target title text to obtain a segmentation result; and generating a prompt prompting the user to search based on the obtained segmentation result.
  • generating a candidate prompt word for prompting the user to search based on the obtained word segmentation results including: determining the part of speech of the vocabulary for the words in the obtained word segmentation results; based on the obtained word segmentation results and the The determined part-of-speech, and generate a candidate prompt word for prompting the user to search.
  • generating a candidate prompt word for prompting the user to search based on the obtained word segmentation results including: for the words in the obtained word segmentation results, determining the importance of the words in the obtained word segmentation results Where the importance is a value used to characterize the importance of the vocabulary; based on the obtained word segmentation results and the determined importance, a candidate prompt word for prompting the user to search is generated.
  • generating a candidate prompt word for prompting the user to search based on the target title text includes: generating an initial candidate prompt word for prompting the user to search based on the target title text; and generating the initial candidate prompt Filter the words to remove the words that meet the preset conditions from the initial candidate hint words; determine the filtered initial candidate hint words as candidate search words.
  • selecting a target prompt for presentation to the user from the generated candidate prompts includes: sorting the generated candidate prompts to obtain a candidate prompt sequence; and from the obtained candidate prompts Target prompts are selected from the sequence for presentation to the user.
  • sorting the generated candidate prompts to obtain a candidate prompt sequence includes: for the candidate prompts in the generated candidate prompts, performing the following scoring steps: determining the candidate prompts and the candidate The relevance of the target title text corresponding to the prompt word, where the relevance is a value used to characterize the degree of relevance of the candidate prompt word to the target title text; based on the determined relevance, determine the superiority used to characterize the candidate prompt word Poor scores; based on the determined scores, sort the obtained candidate cue words to obtain candidate cue word sequences.
  • the scoring step before determining a score for characterizing the candidate cue word based on the determined relevance, the scoring step further includes: determining the language fluency of the candidate cue word, wherein the language fluency Degree is a value used to characterize the degree of language fluency of a candidate cue; and based on the determined relevance, determine a score used to characterize the quality of the candidate cue, including: based on the determined relevance and language fluency Degree to determine the score used to characterize the pros and cons of the candidate cue.
  • an embodiment of the present application provides an apparatus for processing information.
  • the apparatus includes: an obtaining unit configured to obtain a target title text set, where the target title text corresponds to body information, and the target title text is for a user. Enter a search term and click to present the text information corresponding to the clicked target title text to the user; the generating unit is configured to generate, for the target title text in the target title text set, a prompt for the user based on the target title text A search candidate prompt; a selecting unit configured to select a target prompt for presenting to the user from the generated candidate prompts.
  • the generating unit includes: a first generating module configured to input the target title text into a pre-trained prompt word generating model to generate a result prompt; a second generating module configured to be based on the generated result Prompt words, generating candidate prompt words for prompting the user to search.
  • the generating unit includes: an obtaining module configured to obtain a historical search term corresponding to the target title text within a preset historical time period; a first determining module configured to perform a search on the obtained historical search term The historical search term of the search term determines the similarity between the historical search term and the generated result hint word, where the similarity is a value used to characterize the similarity between the historical search term and the result hint word; the extraction module is configured to Extract historical search words with similarity greater than or equal to a preset threshold as candidate prompt words for prompting the user to search.
  • the generating unit includes a word segmentation module configured to segment the target title text to obtain a segmentation result; and a third generation module configured to generate a user prompt for searching based on the obtained segmentation result.
  • a word segmentation module configured to segment the target title text to obtain a segmentation result
  • a third generation module configured to generate a user prompt for searching based on the obtained segmentation result.
  • the third generating module is further configured to: for the vocabulary in the obtained word segmentation result, determine the part-of-speech of the vocabulary; and based on the obtained word segmentation result and the determined part-of-speech, generate a user-friendly Candidate cue words.
  • the third generating module is further configured to: for the vocabulary in the obtained word segmentation result, determine the importance of the vocabulary in the obtained word segmentation result, wherein the importance degree is used to characterize the vocabulary. The value of the importance degree; based on the obtained word segmentation result and the determined importance degree, a candidate prompt word for prompting the user to search is generated.
  • the generating unit includes: a fourth generating module configured to generate an initial candidate prompt word for prompting the user to search based on the target title text; a filtering module configured to pair the generated initial candidate prompt word Performing filtering to remove words that meet the preset conditions from the initial candidate hint words; a second determination module configured to determine the filtered initial candidate hint words as candidate search words.
  • the selection unit includes: a sorting module configured to sort the generated candidate cue words to obtain a candidate cue word sequence; a selection module configured to select from the obtained candidate cue word sequences for use in Target prompts presented to the user.
  • the ranking module is further configured to perform the following scoring step on the candidate prompts in the generated candidate prompts: determine the relevance of the candidate prompts to the target title text corresponding to the candidate prompts , Where the relevance is a value used to characterize the degree of relevance of the candidate cue word to the target title text; based on the determined relevance, determine a score that characterizes the pros and cons of the candidate cue word; based on the determined score Value, sort the obtained candidate cue words, and obtain the candidate cue word sequence.
  • the scoring step before determining a score for characterizing the candidate cue word based on the determined relevance, the scoring step further includes: determining the language fluency of the candidate cue word, wherein the language fluency Degree is a value used to characterize the degree of language fluency of a candidate cue; and based on the determined relevance, determine a score used to characterize the quality of the candidate cue, including: based on the determined relevance and language fluency Degree to determine the score used to characterize the pros and cons of the candidate cue.
  • an embodiment of the present application provides an electronic device including: one or more processors; a storage device that stores one or more programs thereon; when one or more programs are processed by one or more processors Execution causes one or more processors to implement the method of any one of the foregoing methods for processing information.
  • an embodiment of the present application provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method of any one of the foregoing methods for processing information is implemented.
  • the method and device for processing information obtained by the embodiments of the present application obtain a target title text set, where the target title text corresponds to body information, and the target title text is used by a user to input a search term and click to present the clicked text to the user.
  • the body information corresponding to the target title text, and then for the target title text in the target title text set, based on the target title text, a candidate prompt word for prompting the user to generate a search is finally selected from the generated candidate prompt words for use in Target prompts presented to the user, thereby effectively utilizing the target title text collection to generate target prompts for presentation to the user, so that the user can be prompted to search for the content indicated by the target prompt before the user enters a search term to perform a search , Enrich the way of information search, and improve the diversity of information processing.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of a method for processing information according to the present application
  • FIG. 3 is a schematic diagram of an application scenario of a method for processing information according to an embodiment of the present application
  • FIG. 4 is a flowchart of still another embodiment of a method for processing information according to the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for processing information according to the present application.
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 of an embodiment of a method for processing information or an apparatus for processing information to which the present application can be applied.
  • the system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105.
  • the network 104 is a medium for providing a communication link between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications can be installed on the terminal devices 101, 102, and 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and so on.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, and 103 can be various electronic devices that have a display screen and support web browsing, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Pictures Experts Group) Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer 4, moving picture expert compression standard audio layer 4) player, laptop portable computer and desktop computer, etc.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as multiple software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • the server 105 may be a server that provides various services, such as an information processing server that processes a target title text set sent by the terminal devices 101, 102, and 103.
  • the information processing server may analyze and process the received data such as the target title text set, and obtain a processing result (for example, a target prompt).
  • the method for processing information provided in the embodiments of the present application can be executed by the server 105 or by the terminal devices 101, 102, 103; correspondingly, the apparatus for processing information can be set on the server 105 It can also be installed in the terminal devices 101, 102, and 103.
  • the server may be hardware or software.
  • the server can be implemented as a distributed server cluster consisting of multiple servers or as a single server.
  • the server can be implemented as multiple software or software modules (for example, multiple software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • the numbers of terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the above system architecture may not include a network, but only a terminal device or a server.
  • the method for processing information includes the following steps:
  • Step 201 Obtain a target title text set.
  • the execution subject of the method for processing information may be an electronic device (such as the one shown in FIG. 1) connected locally or in communication with the wired connection method or the wireless connection method. Terminal device) to obtain the target title text collection.
  • the target title text is the title text used to process it to obtain the target prompt word.
  • Target prompts are words, phrases, or sentences used to prompt users to search.
  • the target title text corresponds to the body information, and the target title text is used by the user to input a search term and click to present to the user the body information corresponding to the clicked target title text.
  • the target title text is used to describe the content of the corresponding body information.
  • a search term is a word, phrase, or sentence entered by the user for searching.
  • a large amount of text information can be stored in the execution body or the electronic device.
  • the title text corresponding to the text information can be determined in advance.
  • the title text can correspond to the click-through rate.
  • the click rate is the probability that the title text is clicked within a preset time period.
  • the execution body may obtain the title text from the predetermined title text set according to the click rate as the target title text.
  • the above-mentioned execution body may obtain, from the title text set, the corresponding headline text whose corresponding click rate is greater than or equal to a preset threshold as the target headline text; or, the above-mentioned execution body may follow the corresponding click-through rate in descending order, Obtain a preset number of title texts from the title text collection as a preset number of target title texts.
  • Step 202 For the target title text in the target title text set, based on the target title text, a candidate prompt word for prompting the user to search is generated.
  • the execution body may generate various candidate prompt words for prompting the user to search based on the target title text by using various methods.
  • the candidate prompt word may be used to generate a target prompt word, which may be a vocabulary, a phrase, or a sentence, for example, the phrase "weather today".
  • the execution body may generate a candidate prompt word for prompting the user to search based on the target title text through the following steps: first, The above-mentioned execution subject may segment the target title text to obtain a segmentation result. Then, the execution subject may generate a candidate prompt word for prompting the user to search based on the obtained word segmentation result.
  • the segmentation result includes the vocabulary obtained by the segmentation.
  • the segmentation result may be a vocabulary sequence composed of the vocabulary obtained by the segmentation.
  • the words in the vocabulary sequence can be arranged in the order of the words in the target title text.
  • the above-mentioned execution subject may segment the target title text by various methods to obtain a segmentation result. For example, using a dictionary-based maximum forward matching algorithm, a maximum reverse matching algorithm, etc., the target title text is segmented to obtain a segmentation result.
  • segmentation algorithm is a well-known technology that is widely studied and applied at present, and will not be repeated here.
  • the above-mentioned execution subject may use various methods to generate candidate prompt words for prompting the user to search based on the obtained word segmentation results.
  • the above-mentioned execution body may generate a candidate prompt word for prompting the user to search based on the obtained word segmentation result through the following steps: first, for the vocabulary in the obtained word segmentation result, The execution subject may determine the part of speech of the vocabulary. Then, the execution subject may generate a candidate prompt word for prompting the user to search based on the obtained word segmentation result and the determined part-of-speech.
  • the execution body may obtain, from the vocabulary included in the obtained word segmentation result, a vocabulary whose part of speech is a noun as a candidate prompt word for prompting the user to search;
  • a vocabulary with a part of speech as a noun and a vocabulary with a part of speech as a verb are obtained, and the obtained noun and verb are used to form a phrase, and the formed phrase is used as a candidate prompt word for prompting the user to search.
  • the above-mentioned execution subject may further generate a candidate prompt word for prompting the user to search through the following steps: first, for the words in the obtained word segmentation results The above-mentioned execution subject may determine the importance of the vocabulary in the obtained word segmentation result, wherein the importance is a value used to characterize the importance of the vocabulary. Then, the execution subject may generate a candidate prompt word for prompting the user to search based on the obtained word segmentation result and the determined importance degree.
  • the above-mentioned execution subject may use various methods to determine the importance of the vocabulary in the obtained word segmentation result.
  • the execution body may first obtain a preset text set.
  • the preset text is a text preset by a technician to determine the importance of the vocabulary.
  • the execution subject may determine the number of times that the vocabulary appears in the preset text set, and determine the determined number of times as the importance of the vocabulary;
  • a correspondence table of vocabulary and vocabulary importance and the above-mentioned execution subject may determine the importance of the vocabulary by looking up the correspondence table.
  • the execution body may use various methods to generate candidate prompt words for prompting the user to search based on the obtained word segmentation results and the determined importance. Specifically, as an example, the execution body may obtain, from the vocabulary included in the obtained word segmentation result, a vocabulary corresponding to a significance greater than or equal to a preset threshold, and use the obtained vocabulary to form a candidate prompt; The subject can obtain a preset number of words from the vocabulary included in the obtained word segmentation result in order of importance, and use the obtained preset number of words to form candidate prompts.
  • the execution body may further generate a candidate prompt word for prompting the user to search based on the target title text through the following steps: first The above-mentioned execution subject may generate an initial candidate prompt word for prompting the user to search based on the target title text. Then, the above-mentioned execution subject may filter the generated initial candidate prompt words to remove words that meet the preset conditions from the initial candidate prompt words. Finally, the above-mentioned execution body may determine the filtered initial candidate prompt word as a candidate search word.
  • the above-mentioned execution subject may use the above-mentioned various methods for generating candidate prompt words to generate initial candidate prompt words, which will not be repeated here.
  • the preset condition may be a condition predetermined by a technician, for example, the vocabulary belongs to a preset set of bad vocabulary, or the vocabulary is a named entity.
  • the vocabulary is a vocabulary that is not suitable for display by a technician.
  • Named entities refer to names of persons, institutions, places, and all other entities identified by name.
  • entity refers to vocabulary.
  • the above-mentioned execution body may filter the initial candidate prompt word by various methods according to preset conditions. For example, if the preset condition is "the vocabulary belongs to a preset bad vocabulary set", the above-mentioned execution subject may match the initial candidate prompt and the bad vocabulary set to determine whether the initial candidate prompt includes bad vocabulary; if it includes, Then, the bad vocabulary included in the initial candidate prompts is removed to realize the filtering of the initial candidate prompts.
  • the preset condition is "the vocabulary belongs to a preset bad vocabulary set”
  • the above-mentioned execution subject may match the initial candidate prompt and the bad vocabulary set to determine whether the initial candidate prompt includes bad vocabulary; if it includes, Then, the bad vocabulary included in the initial candidate prompts is removed to realize the filtering of the initial candidate prompts.
  • Step 203 Select a target prompt word to be presented to the user from the generated candidate prompt words.
  • the execution subject may select a target prompt to be presented to the user from the generated candidate prompts.
  • the above-mentioned execution subject may use various methods to select a target prompt word to be presented to the user from the generated candidate prompt words.
  • a random selection method is used to select a target prompt word to be presented to the user.
  • the above-mentioned execution subject may select a target hint word to be presented to the user from the generated candidate hint words through the following steps: First, the above-mentioned execution subject may select the generated candidate The cue words are sorted to obtain candidate cue word sequences. Then, the execution subject may select a target prompt word for presenting to the user from the obtained candidate prompt word sequence.
  • the execution body may use various methods to sort the generated candidate cue words to obtain candidate cue word sequences.
  • the execution body may perform the following scoring steps:
  • Step 2031 Determine the relevance between the candidate prompt and the target title text corresponding to the candidate prompt.
  • the relevance is a value used to characterize the relevance of the candidate cue word to the target title text. The larger the value, the higher the degree of correlation.
  • the above-mentioned execution subject may use various methods to determine the correlation degree.
  • the above execution body may perform similarity calculation on the candidate prompt word and the target title text corresponding to the candidate prompt word, and determine the calculation result as the relevance between the candidate prompt word and the target title text corresponding to the candidate prompt word.
  • a technician may set a first correlation degree for characterizing a high correlation degree and a second correlation degree for characterizing a low correlation degree in advance.
  • the above-mentioned execution subject may first determine the nouns in the target title text based on the part-of-speech tagging method.
  • the execution subject may determine whether the candidate cue word includes a noun in the target title text; if it is included, determine the first correlation degree as the correlation between the candidate cue word and the target title text corresponding to the candidate cue word; If it is not included, the above second relevance is determined as the relevance between the candidate prompt and the target title text corresponding to the candidate prompt.
  • the foregoing execution subject may further determine the language fluency of the candidate prompt word.
  • the language fluency is a value used to characterize the language fluency of the candidate cue words. The larger the value, the higher the fluency of the language.
  • the language fluency corresponding to the candidate prompt "The weather is really good today” may be 10; the language fluency corresponding to the candidate prompt "The weather is really good today” may be 8. That is, the candidate fluent “The weather is really good today” is more fluent than the candidate hint “The weather is so good today”.
  • the execution body may use a pre-trained language fluency model to determine the language fluency of the candidate prompt word.
  • the execution subject may input the candidate prompt into the language fluency model to obtain the language fluency of the candidate prompt.
  • the language fluency model may be a model trained on a language model (Language Modeling, LM) or a neural network (Neural Network, NN) and used to characterize the correspondence between the text and the language fluency of the text.
  • LM Language Modeling
  • NN neural network
  • Step 2032 Based on the determined relevance, determine a score used to characterize the pros and cons of the candidate prompt word.
  • the above-mentioned execution body may directly determine the determined correlation degree as a score characterizing the pros and cons of the candidate prompt word, or may process the correlation degree to obtain a processing result, and further determine the processing result as a useful A score that characterizes the pros and cons of the candidate cue.
  • the obtained correlation degree and a preset value (for example, 100) may be subjected to a quadrature process, and the result of the quadrature process may be determined as a score used to characterize the pros and cons of the candidate prompt word.
  • the execution subject may further determine, based on the determined relevance and language fluency, a characterization for the candidate prompt word. The score of the pros and cons.
  • the above-mentioned execution subject may use various methods to determine a score used to characterize the pros and cons of the candidate prompt word based on the determined relevance and language fluency. For example, the determined relevance and language fluency may be directly summed, and the summation result may be determined as a score used to characterize the pros and cons of the candidate cue; or, the above-mentioned execution subject may obtain a technician's advance Weights assigned to relevance and linguistic fluency, weighted summation of relevance and linguistic fluency, to obtain a weighted summation value, and then the obtained weighted summation value is determined to be an excellent feature for characterizing the candidate prompt Inferiority score.
  • the technician determines in advance that the weight corresponding to the correlation is 0.7, and the weight corresponding to the language fluency is 0.3.
  • the above-mentioned execution subject determined that the correlation between the candidate prompt "Neural Network” and the target title text "Neural Network: From Neurons to Deep Learning” is 9; the language fluency of the candidate prompt "Neural Network” is 10.
  • the above-mentioned execution body may determine the determined weighted summation value “9.3” as a score characterizing the pros and cons of the candidate cue word “neural network”.
  • Step 2033 Sort the obtained candidate cue words based on the determined scores to obtain a candidate cue word sequence.
  • the above-mentioned execution subject may sort the obtained candidate prompt words according to the order of the score value (large to small order or small to large order) to obtain a candidate prompt word sequence.
  • FIG. 3 is a schematic diagram of an application scenario of a method for processing information according to this embodiment.
  • the server 301 may first obtain a target title text set 303 sent by the terminal device 302.
  • the target title text corresponds to body information
  • the target title text is used by a user to input a search term and click to present to the user the text information corresponding to the clicked target title text.
  • the target title text collection includes the target title text (for example, " Neural Networks from Principle to Implementation ”) 3031 and target title text (eg" Natural Language Overview ”) 3032.
  • the server 301 may generate a candidate prompt word (for example, "neural network”) 3041 for prompting the user to search.
  • a candidate prompt word for example, "Language Overview”
  • the server 301 may select a target prompt 305 (for example, “neural network”) for presentation to the user from the generated candidate prompts 3041 and 3042.
  • the method provided by the above embodiments of the present application effectively utilizes the target title text set to generate a target prompt for presentation to the user, so that the user can be prompted to search for the content indicated by the target prompt before the user enters a search term for a search , Enrich the way of information search, and improve the diversity of information processing.
  • the process 400 of the method for processing information includes the following steps:
  • Step 401 Obtain a target title text set.
  • the execution subject of the method for processing information may be an electronic device (such as the one shown in FIG. 1) connected locally or in communication with the wired connection method or the wireless connection method. Terminal device) to obtain the target title text collection.
  • the target title text is the title text used to process it to obtain the target prompt word.
  • Target prompts are words, phrases, or sentences used to prompt users to search.
  • the target title text corresponds to the body information, and the target title text is used by the user to input a search term and click to present to the user the body information corresponding to the clicked target title text.
  • the target title text is used to describe the content of the corresponding body information.
  • a search term is a word, phrase, or sentence entered by the user for searching.
  • Step 402 For the target title text in the target title text set, input the target title text into a pre-trained prompt word generation model to generate a result prompt word.
  • the execution body may input the target title text into a pre-trained prompt word generation model to generate a result prompt word.
  • the result cue is the output of the cue generation model.
  • the prompt word generation model is used to represent the correspondence between the title text and the result prompt words.
  • the prompt word generation model may be a model trained based on a predetermined initial model (for example, a Seq2seq model, a Convolutional Neural Network (CNN), etc.).
  • the above-mentioned prompt word generation model can be trained by the following steps:
  • the training samples include sample title text and sample result hint words.
  • sample title text may be a pre-stored title text.
  • the sample result prompt may be a search term entered by a user who clicks on the sample title text.
  • the sample title text in the training sample set can be used as the input of a predetermined initial model, and the sample result prompts corresponding to the input sample title text can be used as the desired output.
  • the above initial model can be trained by using machine learning To get the prompt word generation model.
  • Step 403 Generate candidate prompt words for prompting the user to search based on the generated result prompt words.
  • the above-mentioned execution subject may use various methods to generate candidate prompt words for prompting the user to search based on the result prompt words generated in step 402.
  • the execution subject may directly determine the generated result prompt as a candidate prompt.
  • the foregoing execution subject may generate a candidate prompt word for prompting the user to search based on the generated result prompt word through the following steps:
  • the execution subject may obtain a historical search term corresponding to the target title text within a preset historical time period.
  • the historical search term corresponding to the target title text is a search term entered by a user before clicking the target title text within a preset historical time period.
  • the above-mentioned execution subject may determine the similarity between the historical search terms and the generated result hint words, where the similarity is used to characterize the historical search words and the result hint words The degree of similarity between the values.
  • the above-mentioned execution subject may extract historical search words with similarity greater than or equal to a preset threshold as candidate prompt words for prompting the user to search.
  • using historical search words input by the user to determine candidate prompts may improve the language fluency of the candidate prompts.
  • Step 404 Select a target prompt word to be presented to the user from the generated candidate prompt words.
  • the execution subject may select a target prompt to be presented to the user from the generated candidate prompts.
  • the above-mentioned execution subject may use various methods to select a target prompt word to be presented to the user from the generated candidate prompt words.
  • a random selection method is used to select a target prompt word to be presented to the user.
  • steps 401 and 404 are consistent with steps 201 and 203 in the foregoing embodiment, respectively.
  • steps 201 and 203 also apply to steps 401 and 403, which are not described herein again.
  • this embodiment provides another solution for generating candidate prompt words, which improves the diversity of information processing, and uses the prompt word generation model to generate candidate prompt words, which can improve the accuracy of information processing.
  • this application provides an embodiment of a device for processing information.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2.
  • the device can be specifically applied to various electronic devices.
  • the apparatus 500 for processing information in this embodiment includes an obtaining unit 501, a generating unit 502, and a selecting unit 503.
  • the obtaining unit 501 is configured to obtain a target title text set, where the target title text corresponds to body information, and the target title text is used by a user to input a search term and click to present to the user the body information corresponding to the clicked target title text.
  • the generating unit 502 is configured to generate a candidate prompt word for prompting the user to search for the target title text in the target title text set based on the target title text; the selecting unit 503 is configured to select from the generated candidate prompt words Target prompts to present to the user.
  • the obtaining unit 501 of the apparatus 500 for processing information may obtain the target title text from an electronic device (such as a terminal device shown in FIG. 1) that is locally or communicatively connected thereto through a wired connection method or a wireless connection method. set.
  • the target title text is the title text used to process it to obtain the target prompt word.
  • Target prompts are words, phrases, or sentences used to prompt users to search.
  • the target title text corresponds to the body information, and the target title text is used by the user to input a search term and click to present to the user the body information corresponding to the clicked target title text.
  • the target title text is used to describe the content of the corresponding body information.
  • a search term is a word, phrase, or sentence entered by the user for searching.
  • the generating unit 502 may use various methods to generate candidate prompt words for prompting the user to search based on the target title text.
  • the candidate prompt word may be used to generate a target prompt word, which may be a vocabulary, a phrase, or a sentence, for example, the phrase "weather today".
  • the selecting unit 503 may select a target prompt to be presented to the user from the generated candidate prompts.
  • the selection unit 503 may use various methods to select a target prompt word to be presented to the user from the generated candidate prompt words. For example, a random selection method is used to select a target prompt word to be presented to the user.
  • the generating unit 502 may include: a first generating module (not shown in the figure), configured to input the target title text into a pre-trained prompt word generation model, and generate a result Prompt words; a second generation module (not shown in the figure) is configured to generate candidate prompt words for prompting the user to search based on the generated result prompt words.
  • the generating unit 502 may include: an obtaining module (not shown in the figure) configured to obtain a historical search term corresponding to the target title text within a preset historical time period;
  • the first determining module (not shown in the figure) is configured to determine the similarity between the historical search term and the generated result suggestion word for the historical search term in the obtained historical search term, wherein the similarity is A value representing the degree of similarity between historical search words and result prompt words;
  • an extraction module (not shown in the figure) is configured to extract historical search words with similarity greater than or equal to a preset threshold as candidates for prompting users to search Prompt words.
  • the generating unit 502 may include a word segmentation module (not shown in the figure) configured to perform word segmentation on the target title text to obtain a word segmentation result; a third generation module (FIG. (Not shown), and is configured to generate a candidate prompt word for prompting the user to search based on the obtained word segmentation result.
  • a word segmentation module (not shown in the figure) configured to perform word segmentation on the target title text to obtain a word segmentation result
  • FOG. Not shown
  • the third generating module may be further configured to: for the vocabulary in the obtained word segmentation result, determine the part of speech of the vocabulary; based on the obtained word segmentation result and the determined part of speech To generate candidate prompts for prompting the user to search.
  • the third generating module may be further configured to: for the vocabulary in the obtained word segmentation result, determine the importance of the vocabulary in the obtained word segmentation result, wherein, The importance is a value used to characterize the importance of the vocabulary; based on the obtained word segmentation results and the determined importance, a candidate prompt word for prompting the user to search is generated.
  • the generating unit 502 may include: a fourth generating module (not shown in the figure) configured to generate an initial candidate prompt for prompting the user to search based on the target title text Words; a filtering module (not shown in the figure) configured to filter the generated initial candidate hint words to remove words that meet the preset conditions from the initial candidate hint words; a second determining module (not shown in the figure) ) Is configured to determine the filtered initial candidate prompt word as a candidate search word.
  • the selection unit 503 may include: a sorting module (not shown in the figure) configured to sort the generated candidate prompt words to obtain a candidate prompt word sequence; a selection module (Not shown in the figure), configured to select a target prompt word for presentation to the user from the obtained candidate prompt word sequence.
  • the ranking module may be further configured to perform the following scoring steps on the candidate prompts in the generated candidate prompts: determine the candidate prompts and the candidate prompts. Correspondence of the corresponding target title text, where the relevance is a value used to characterize the degree of relevance of the candidate cue word to the target title text; based on the determined relevance, determine the quality of the candidate cue word Scores: Based on the determined scores, the obtained candidate cue words are sorted to obtain candidate cue word sequences.
  • the scoring step may further include: Verbal fluency, where verbal fluency is a value used to characterize the degree of verbal fluency of a candidate cue; and determining a score that characterizes the pros and cons of the candidate cue based on the determined relevance, including: based on The determined relevance degree and language fluency determine a score for characterizing the pros and cons of the candidate cue word.
  • the apparatus 500 provided by the above embodiment of the present application effectively uses the target title text set to generate a target prompt for presentation to the user, so that the user can be prompted to search for the target indicated by the target prompt before the user enters a search term for searching Content enriches the way of information search and improves the diversity of information processing.
  • FIG. 6 illustrates a schematic structural diagram of a computer system 600 suitable for implementing an electronic device (such as a terminal device or a server shown in FIG. 1) in the embodiment of the present application.
  • an electronic device such as a terminal device or a server shown in FIG. 1.
  • the electronic device shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
  • the computer system 600 includes a central processing unit (CPU) 601, which can be loaded into a random access memory (RAM) 603 according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608. Instead, perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read-only memory
  • various programs and data required for the operation of the system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input / output (I / O) interface 605 is also connected to the bus 604.
  • the following components are connected to the I / O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the speaker; a storage portion 608 including a hard disk and the like; a communication section 609 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 609 performs communication processing via a network such as the Internet.
  • the driver 610 is also connected to the I / O interface 605 as necessary.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as needed, so that a computer program read therefrom is installed into the storage section 608 as needed.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611.
  • CPU central processing unit
  • the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions.
  • the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented by software or hardware.
  • the described unit may also be provided in a processor, for example, it may be described as: a processor includes an obtaining unit, a generating unit, and a selecting unit. Among them, the names of these units do not constitute a limitation on the unit itself in some cases.
  • the obtaining unit may also be described as a "unit for obtaining a target title text set".
  • the present application also provides a computer-readable medium, which may be included in the electronic device described in the foregoing embodiments; or may exist alone without being assembled into the electronic device in.
  • the computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device is configured to obtain a target title text set, where the target title text corresponds to body information and the target title text It is used for the user to enter a search term and click to present the text information corresponding to the clicked target title text; for the target title text in the target title text collection, based on the target title text, a candidate for prompting the user to search is generated Prompt words; select target prompt words to be presented to the user from the generated candidate prompt words.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种用于处理信息的方法和装置。该方法的包括:获取目标标题文本集合(201),其中,目标标题文本对应正文信息,目标标题文本用于用户输入搜索词后点击,以向用户呈现所点击的目标标题文本所对应的正文信息;对于目标标题文本集合中的目标标题文本,基于该目标标题文本,生成用于提示用户搜索的候选提示词(202);从所生成的候选提示词中选取用于呈现给用户的目标提示词(203)。该方法丰富了信息搜索的方式,提高了信息处理的多样性。

Description

用于处理信息的方法和装置
本专利申请要求于2018年9月14日提交的、申请号为201811075460.5、申请人为北京字节跳动网络技术有限公司、发明名称为“用于处理信息的方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本申请实施例涉及计算机技术领域,尤其涉及用于处理信息的方法和装置。
背景技术
目前,随着科技的发展,人们已经可以使用手机、计算机等电子设备进行信息搜索,获得搜索结果。通常,人们可以在搜索引擎或者应用软件的搜索框中输入用于搜索的搜索词,以进行信息搜索。其中,搜索词可以是词汇、短语或者句子等。
发明内容
本申请实施例提出了用于处理信息的方法和装置。
第一方面,本申请实施例提供了一种用于处理信息的方法,该方法包括:获取目标标题文本集合,其中,目标标题文本对应正文信息,目标标题文本用于用户输入搜索词后点击,以向用户呈现所点击的目标标题文本所对应的正文信息;对于目标标题文本集合中的目标标题文本,基于该目标标题文本,生成用于提示用户搜索的候选提示词;从所生成的候选提示词中选取用于呈现给用户的目标提示词。
在一些实施例中,基于该目标标题文本,生成用于提示用户搜索的候选提示词,包括:将该目标标题文本输入预先训练的提示词生成模型,生成结果提示词;基于所生成的结果提示词,生成用于提示用 户搜索的候选提示词。
在一些实施例中,基于所生成的结果提示词,生成用于提示用户搜索的候选提示词,包括:获取预设历史时间段内该目标标题文本所对应的历史搜索词;对于所获得的历史搜索词中的历史搜索词,确定该历史搜索词与所生成的结果提示词的相似度,其中,相似度为用于表征历史搜索词与结果提示词之间的相似程度的数值;提取相似度大于等于预设阈值的历史搜索词作为用于提示用户搜索的候选提示词。
在一些实施例中,基于该目标标题文本,生成用于提示用户搜索的候选提示词,包括:对该目标标题文本进行分词,获得分词结果;基于所获得的分词结果,生成用于提示用户搜索的候选提示词。
在一些实施例中,基于所获得的分词结果,生成用于提示用户搜索的候选提示词,包括:对于所获得的分词结果中的词汇,确定该词汇的词性;基于所获得的分词结果和所确定的词性,生成用于提示用户搜索的候选提示词。
在一些实施例中,基于所获得的分词结果,生成用于提示用户搜索的候选提示词,包括:对于所获得的分词结果中的词汇,确定在所获得的分词结果中,该词汇的重要度,其中,重要度为用于表征词汇的重要程度的数值;基于所获得的分词结果和所确定的重要度,生成用于提示用户搜索的候选提示词。
在一些实施例中,基于该目标标题文本,生成用于提示用户搜索的候选提示词,包括:基于该目标标题文本,生成用于提示用户搜索的初始候选提示词;对所生成的初始候选提示词进行过滤,以去除初始候选提示词中符合预设条件的词汇;将过滤后的初始候选提示词确定为候选搜索词。
在一些实施例中,从所生成的候选提示词中选取用于呈现给用户的目标提示词,包括:对所生成的候选提示词进行排序,获得候选提示词序列;从所获得的候选提示词序列中选取用于呈现给用户的目标提示词。
在一些实施例中,对所生成的候选提示词进行排序,获得候选提示词序列,包括:对于所生成的候选提示词中的候选提示词,执行以 下评分步骤:确定该候选提示词与该候选提示词所对应的目标标题文本的相关度,其中,相关度为用于表征候选提示词与目标标题文本的相关程度的数值;基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值;基于所确定的分值,对所获得的候选提示词进行排序,获得候选提示词序列。
在一些实施例中,在基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值之前,评分步骤还包括:确定该候选提示词的语言流畅度,其中,语言流畅度为用于表征候选提示词的语言流畅程度的数值;以及基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值,包括:基于所确定的相关度和语言流畅度,确定用于表征该候选提示词的优劣程度的分值。
第二方面,本申请实施例提供了一种用于处理信息的装置,该装置包括:获取单元,被配置成获取目标标题文本集合,其中,目标标题文本对应正文信息,目标标题文本用于用户输入搜索词后点击,以向用户呈现所点击的目标标题文本所对应的正文信息;生成单元,被配置成对于目标标题文本集合中的目标标题文本,基于该目标标题文本,生成用于提示用户搜索的候选提示词;选取单元,被配置成从所生成的候选提示词中选取用于呈现给用户的目标提示词。
在一些实施例中,生成单元包括:第一生成模块,被配置成将该目标标题文本输入预先训练的提示词生成模型,生成结果提示词;第二生成模块,被配置成基于所生成的结果提示词,生成用于提示用户搜索的候选提示词。
在一些实施例中,生成单元包括:获取模块,被配置成获取预设历史时间段内该目标标题文本所对应的历史搜索词;第一确定模块,被配置成对于所获得的历史搜索词中的历史搜索词,确定该历史搜索词与所生成的结果提示词的相似度,其中,相似度为用于表征历史搜索词与结果提示词之间的相似程度的数值;提取模块,被配置成提取相似度大于等于预设阈值的历史搜索词作为用于提示用户搜索的候选提示词。
在一些实施例中,生成单元包括:分词模块,被配置成对该目标 标题文本进行分词,获得分词结果;第三生成模块,被配置成基于所获得的分词结果,生成用于提示用户搜索的候选提示词。
在一些实施例中,第三生成模块进一步被配置成:对于所获得的分词结果中的词汇,确定该词汇的词性;基于所获得的分词结果和所确定的词性,生成用于提示用户搜索的候选提示词。
在一些实施例中,第三生成模块进一步被配置成:对于所获得的分词结果中的词汇,确定在所获得的分词结果中,该词汇的重要度,其中,重要度为用于表征词汇的重要程度的数值;基于所获得的分词结果和所确定的重要度,生成用于提示用户搜索的候选提示词。
在一些实施例中,生成单元包括:第四生成模块,被配置成基于该目标标题文本,生成用于提示用户搜索的初始候选提示词;过滤模块,被配置成对所生成的初始候选提示词进行过滤,以去除初始候选提示词中符合预设条件的词汇;第二确定模块,被配置成将过滤后的初始候选提示词确定为候选搜索词。
在一些实施例中,选取单元包括:排序模块,被配置成对所生成的候选提示词进行排序,获得候选提示词序列;选取模块,被配置成从所获得的候选提示词序列中选取用于呈现给用户的目标提示词。
在一些实施例中,排序模块进一步被配置成:对于所生成的候选提示词中的候选提示词,执行以下评分步骤:确定该候选提示词与该候选提示词所对应的目标标题文本的相关度,其中,相关度为用于表征候选提示词与目标标题文本的相关程度的数值;基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值;基于所确定的分值,对所获得的候选提示词进行排序,获得候选提示词序列。
在一些实施例中,在基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值之前,评分步骤还包括:确定该候选提示词的语言流畅度,其中,语言流畅度为用于表征候选提示词的语言流畅程度的数值;以及基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值,包括:基于所确定的相关度和语言流畅度,确定用于表征该候选提示词的优劣程度的分值。
第三方面,本申请实施例提供了一种电子设备,包括:一个或多 个处理器;存储装置,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现上述用于处理信息的方法中任一实施例的方法。
第四方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现上述用于处理信息的方法中任一实施例的方法。
本申请实施例提供的用于处理信息的方法和装置,通过获取目标标题文本集合,其中,目标标题文本对应正文信息,目标标题文本用于用户输入搜索词后点击,以向用户呈现所点击的目标标题文本所对应的正文信息,而后对于目标标题文本集合中的目标标题文本,基于该目标标题文本,生成用于提示用户搜索的候选提示词,最后从所生成的候选提示词中选取用于呈现给用户的目标提示词,从而有效利用目标标题文本集合生成了用于呈现给用户的目标提示词,以此,可以在用户输入搜索词进行搜索前,提示用户搜索目标提示词所指示的内容,丰富了信息搜索的方式,提高了信息处理的多样性。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请的一个实施例可以应用于其中的示例性系统架构图;
图2是根据本申请的用于处理信息的方法的一个实施例的流程图;
图3是根据本申请实施例的用于处理信息的方法的一个应用场景的示意图;
图4是根据本申请的用于处理信息的方法的又一个实施例的流程图;
图5是根据本申请的用于处理信息的装置的一个实施例的结构示意图;
图6是适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请的用于处理信息的方法或用于处理信息的装置的实施例的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group AudioLayer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务的多个软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103发送的目标标题文本集合进行处理的信息处理服务器。信息 处理服务器可以对接收到的目标标题文本集合等数据进行分析等处理,获得处理结果(例如目标提示词)。
需要说明的是,本申请实施例所提供的用于处理信息的方法可以由服务器105执行,也可以由终端设备101、102、103执行;相应地,用于处理信息的装置可以设置于服务器105中,也可以设置于终端设备101、102、103中。
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务的多个软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。在目标标题文本集合或者生成目标提示词的过程中所使用的数据不需要从远程获取的情况下,上述系统架构可以不包括网络,而只包括终端设备或服务器。
继续参考图2,示出了根据本申请的用于处理信息的方法的一个实施例的流程200。该用于处理信息的方法,包括以下步骤:
步骤201,获取目标标题文本集合。
在本实施例中,用于处理信息的方法的执行主体(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式从本地或者与之通信连接的电子设备(例如图1所示的终端设备)获取目标标题文本集合。其中,目标标题文本为用于对其进行处理以获得目标提示词的标题文本。目标提示词为用于提示用户搜索的词汇、短语或者句子。目标标题文本对应正文信息,目标标题文本用于用户输入搜索词后点击,以向用户呈现所点击的目标标题文本所对应的正文信息。目标标题文本用于描述所对应的正文信息的内容。搜索词为用户输入的、用于搜索的词汇、短语或者句子。
实践中,上述执行主体或上述电子设备中可以存储大量的正文信 息。且正文信息所对应的标题文本可以预先确定。另外,标题文本可以对应点击率。点击率为在预设时间段内,标题文本被点击的概率。进而,可选的,上述执行主体可以根据点击率,从预先确定的标题文本集合中获取标题文本作为目标标题文本。具体的,上述执行主体可以从标题文本集合中获取所对应的点击率大于等于预设阈值的标题文本作为目标标题文本;或者,上述执行主体可以按照所对应的点击率由大到小的顺序,从标题文本集合中获取预设数量个标题文本作为预设数量个目标标题文本。
步骤202,对于目标标题文本集合中的目标标题文本,基于该目标标题文本,生成用于提示用户搜索的候选提示词。
在本实施例中,对于步骤201中得到的目标标题文本集合中的目标标题文本,上述执行主体可以基于该目标标题文本,采用各种方法生成用于提示用户搜索的候选提示词。其中,候选提示词可以用于生成目标提示词,可以为词汇、短语或者句子,例如为短语“今日天气”。
在本实施例的一些可选的实现方式中,对于目标标题文本集合中的目标标题文本,上述执行主体可以基于该目标标题文本,通过以下步骤生成用于提示用户搜索的候选提示词:首先,上述执行主体可以对该目标标题文本进行分词,获得分词结果。然后,上述执行主体可以基于所获得的分词结果,生成用于提示用户搜索的候选提示词。
其中,分词结果包括分词得到的词汇。具体的,作为示例,分词结果可以为分词得到的词汇所组成的词汇序列。词汇序列中的词汇可以按照目标标题文本中的词汇的排列顺序排列。
具体的,上述执行主体可以采用各种方法对该目标标题文本进行分词,获得分词结果。例如,采用基于词典的最大正向匹配算法、最大逆向匹配算法等,对该目标标题文本进行分词,获得分词结果。
需要说明的是,分词算法是目前广泛研究和应用的公知技术,此处不再赘述。
在本实现方式中,上述执行主体可以基于所获得的分词结果,采用各种方法生成用于提示用户搜索的候选提示词。
在本实施例的一些可选的实现方式中,上述执行主体可以基于所 获得的分词结果,通过以下步骤生成用于提示用户搜索的候选提示词:首先,对于所获得的分词结果中的词汇,上述执行主体可以确定该词汇的词性。然后,上述执行主体可以基于所获得的分词结果和所确定的词性,生成用于提示用户搜索的候选提示词。例如,上述执行主体可以从所获得的分词结果所包括的词汇中,获取词性为名词的词汇作为用于提示用户搜索的候选提示词;或者,上述执行主体可以从所获得的分词结果所包括的词汇中,获取词性为名词的词汇和词性为动词的词汇,并利用所获取的名词和动词组成词组,将所组成的词组作为用于提示用户搜索的候选提示词。
需要说明的是,确定词汇的词性的方法是目前广泛研究和应用的公知技术,此处不再赘述。
在本实施例的一些可选的实现方式中,基于所获得的分词结果,上述执行主体还可以通过以下步骤生成用于提示用户搜索的候选提示词:首先,对于所获得的分词结果中的词汇,上述执行主体可以确定在所获得的分词结果中,该词汇的重要度,其中,重要度为用于表征词汇的重要程度的数值。然后,上述执行主体可以基于所获得的分词结果和所确定的重要度,生成用于提示用户搜索的候选提示词。
在这里,对于所获得的分词结果中的词汇,上述执行主体可以采用各种方法确定在所获得的分词结果中,该词汇的重要度。例如,上述执行主体可以首先获取预设文本集合。其中,预设文本为技术人员预设收集的、用于确定词汇的重要度的文本。然后,对于所获得的分词结果中的词汇,上述执行主体可以确定该词汇在预设文本集合中出现的次数,并将所确定的次数确定为该词汇的重要度;或者,技术人员可以预先建立词汇与词汇的重要度的对应关系表,进而上述执行主体可以通过查找上述对应关系表,确定该词汇的重要度。
在本实现方式中,上述执行主体可以采用各种方法基于所获得的分词结果和所确定的重要度,生成用于提示用户搜索的候选提示词。具体的,作为示例,上述执行主体可以从所获得的分词结果所包括的词汇中,获取所对应的重要度大于等于预设阈值的词汇,利用所获取的词汇组成候选提示词;或者,上述执行主体可以按照重要度由大到 小的顺序,从所获得的分词结果所包括的词汇中获取预设数量个词汇,利用所获取的预设数量个词汇组成候选提示词。
在本实施例的一些可选的实现方式中,对于目标标题文本集合中的目标标题文本,上述执行主体还可以基于该目标标题文本,通过以下步骤生成用于提示用户搜索的候选提示词:首先,上述执行主体可以基于该目标标题文本,生成用于提示用户搜索的初始候选提示词。然后,上述执行主体可以对所生成的初始候选提示词进行过滤,以去除初始候选提示词中符合预设条件的词汇。最后,上述执行主体可以将过滤后的初始候选提示词确定为候选搜索词。
在这里,上述执行主体可以采用上述各种用于生成候选提示词的方法生成初始候选提示词,此处不再赘述。
预设条件可以为技术人员预先确定的条件,例如词汇属于预设的不良词汇集合,或者词汇为命名实体。其中,不良词汇为技术人员指定的不利于显示的词汇。命名实体指的是人名、机构名、地名以及其他所有以名称为标识的实体。在这里,实体指的是词汇。
在本实现方式中,上述执行主体可以根据预设条件,采用各种方法对初始候选提示词进行过滤。例如,上述预设条件为“词汇属于预设的不良词汇集合”,则上述执行主体可以对初始候选提示词和不良词汇集合进行匹配,以确定初始候选提示词中是否包括不良词汇;若包括,则将初始候选提示词所包括的不良词汇去除,以实现对初始候选提示词的过滤。
步骤203,从所生成的候选提示词中选取用于呈现给用户的目标提示词。
在本实施例中,基于步骤202中得到的候选提示词,上述执行主体可以从所生成的候选提示词中选取用于呈现给用户的目标提示词。
在这里,上述执行主体可以采用各种方法从所生成的候选提示词中选取用于呈现给用户的目标提示词。例如,采用随机选取的方法选取用于呈现给用户的目标提示词。
在本实施例的一些可选的实现方式中,上述执行主体可以通过以下步骤从所生成的候选提示词中选取用于呈现给用户的目标提示词: 首先,上述执行主体可以对所生成的候选提示词进行排序,获得候选提示词序列。然后,上述执行主体可以从所获得的候选提示词序列中选取用于呈现给用户的目标提示词。
在这里,上述执行主体可以采用各种方法对所生成的候选提示词进行排序,获得候选提示词序列。
在本实施例的一些可选的实现方式中,对于所生成的候选提示词中的候选提示词,上述执行主体可以执行以下评分步骤:
步骤2031,确定该候选提示词与该候选提示词所对应的目标标题文本的相关度。
其中,相关度为用于表征候选提示词与目标标题文本的相关程度的数值。数值越大,相关程度可以越高。
具体的,上述执行主体可以采用各种方法确定相关度。例如,上述执行主体可以对候选提示词和该候选提示词所对应的目标标题文本进行相似度计算,并将计算结果确定为该候选提示词与该候选提示词所对应的目标标题文本的相关度;或者,技术人员可以预先设置用于表征高的相关程度的第一相关度和用于表征低的相关程度的第二相关度。进而,上述执行主体可以首先基于词性标注的方法,确定出目标标题文本中的名词。然后,上述执行主体可以确定该候选提示词是否包括目标标题文本中的名词;若包括,将上述第一相关度确定为该候选提示词与该候选提示词所对应的目标标题文本的相关度;若不包括,将上述第二相关度确定为该候选提示词与该候选提示词所对应的目标标题文本的相关度。
需要说明的是,相似度计算方法和词性标注方法是目前广泛研究和应用的公知技术,此处不再赘述。
在本实施例的一些可选的实现方式中,上述执行主体还可以确定该候选提示词的语言流畅度。其中,语言流畅度为用于表征候选提示词的语言流畅程度的数值。数值越大,语言流畅程度可以越高。
作为示例,候选提示词“今天天气真好”所对应的语言流畅度可以为10;候选提示词“天气真好今天”所对应的语言流畅度可以为8。即候选提示词“今天天气真好”的语言流畅程度高于候选提示词“天 气真好今天”。
在本实现方式中,上述执行主体可以利用预先训练的语言流畅度模型确定该候选提示词的语言流畅度。具体的,上述执行主体可以将该候选提示词输入上述语言流畅度模型,获得该候选提示词的语言流畅度。其中,语言流畅度模型可以为基于语言模型(Language Modeling,LM)或者神经网络(Neural Network,NN)训练得到的、用于表征文本与文本的语言流畅度的对应关系的模型。
需要说明的是,训练获得语言流畅度模型的方法是目前广泛研究和应用的公知技术,此处不再赘述。
步骤2032,基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值。
在这里,上述执行主体可以直接将所确定的相关度确定为用于表征该候选提示词的优劣程度的分值,也可以对相关度进行处理,获得处理结果,进而将处理结果确定为用于表征该候选提示词的优劣程度的分值。作为示例,可以对所获得的相关度和预设数值(例如100)进行求积处理,并将求积处理结果确定为用于表征该候选提示词的优劣程度的分值。
在本实施例的一些可选的实现方式中,当确定出该候选提示词的语言流畅度时,上述执行主体还可以基于所确定的相关度和语言流畅度,确定用于表征该候选提示词的优劣程度的分值。
具体的,上述执行主体可以采用各种方法,基于所确定的相关度和语言流畅度,确定用于表征该候选提示词的优劣程度的分值。例如,可以直接对所确定的相关度和语言流畅度进行求和,并将求和结果确定为用于表征该候选提示词的优劣程度的分值;或者,上述执行主体可以获取技术人员预先为相关度和语言流畅度分配的权重,并对相关度和语言流畅度进行加权求和,获得加权求和值,进而将所获得的加权求和值确定为用于表征该候选提示词的优劣程度的分值。
作为示例,技术人员预先确定了相关度所对应的权重为0.7,语言流畅度所对应的权重为0.3。上述执行主体确定出候选提示词“神经网络”与目标标题文本“神经网络浅讲:从神经元到深度学习”的相关 度为9;候选提示词“神经网络”的语言流畅度为10。则上述执行主体可以基于预先确定的权重“0.7”和“0.3”,对相关度“9”和语言流畅度“10”进行加权求和,获得加权求和值“9.3”(9.3=0.7×9+0.3×10),进而,上述执行主体可以将所确定的加权求和值“9.3”确定为用于表征候选提示词“神经网络”的优劣程度的分值。
步骤2033,基于所确定的分值,对所获得的候选提示词进行排序,获得候选提示词序列。
具体的,上述执行主体可以按照分值的大小顺序(由大到小的顺序或者由小到大的顺序,对所获得的候选提示词进行排序,获得候选提示词序列。
继续参见图3,图3是根据本实施例的用于处理信息的方法的应用场景的一个示意图。在图3的应用场景中,服务器301首先可以获取终端设备302发送的目标标题文本集合303。其中,目标标题文本对应正文信息,目标标题文本用于用户输入搜索词后点击,以向用户呈现所点击的目标标题文本所对应的正文信息,这里,目标标题文本集合包括目标标题文本(例如“神经网络从原理到实现”)3031和目标标题文本(例如“自然语言概述”)3032。然后,对于目标标题文本集合303中的目标标题文本3031,基于该目标标题文本,服务器301可以生成用于提示用户搜索的候选提示词(例如“神经网络”)3041。对于目标标题文本集合303中的目标标题文本3032,基于该目标标题文本,服务器301可以生成用于提示用户搜索的候选提示词(例如“语言概述”)3042。最后,服务器301可以从所生成的候选提示词3041、3042中选取用于呈现给用户的目标提示词305(例如“神经网络”)。
本申请的上述实施例提供的方法有效利用目标标题文本集合生成了用于呈现给用户的目标提示词,以此,可以在用户输入搜索词进行搜索前,提示用户搜索目标提示词所指示的内容,丰富了信息搜索的方式,提高了信息处理的多样性。
进一步参考图4,其示出了用于处理信息的方法的又一个实施例的流程400。该用于处理信息的方法的流程400,包括以下步骤:
步骤401,获取目标标题文本集合。
在本实施例中,用于处理信息的方法的执行主体(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式从本地或者与之通信连接的电子设备(例如图1所示的终端设备)获取目标标题文本集合。其中,目标标题文本为用于对其进行处理以获得目标提示词的标题文本。目标提示词为用于提示用户搜索的词汇、短语或者句子。目标标题文本对应正文信息,目标标题文本用于用户输入搜索词后点击,以向用户呈现所点击的目标标题文本所对应的正文信息。目标标题文本用于描述所对应的正文信息的内容。搜索词为用户输入的、用于搜索的词汇、短语或者句子。
步骤402,对于目标标题文本集合中的目标标题文本,将该目标标题文本输入预先训练的提示词生成模型,生成结果提示词。
在本实施例中,对于步骤401中得到的目标标题文本集合中的目标标题文本,上述执行主体可以将该目标标题文本输入预先训练的提示词生成模型,生成结果提示词。结果提示词即为提示词生成模型的输出结果。提示词生成模型用于表征标题文本和结果提示词的对应关系。在这里,提示词生成模型可以是基于预先确定的初始模型(例如Seq2seq模型,卷积神经网络(Convolutional Neural Network,CNN)等)训练得到的模型。
具体的,作为示例,上述提示词生成模型可以通过以下步骤训练得到:
首先,获取训练样本集。其中,训练样本包括样本标题文本和样本结果提示词。
需要说明的是,样本标题文本可以为预先存储的标题文本。样本结果提示词可以为点击样本标题文本的用户所输入的搜索词。
然后,可以将训练样本集中的样本标题文本作为预先确定的初始模型的输入,将所输入的样本标题文本所对应的样本结果提示词作为期望输出,利用机器学习的方法,对上述初始模型进行训练,获得提示词生成模型。
步骤403,基于所生成的结果提示词,生成用于提示用户搜索的 候选提示词。
在本实施例中,上述执行主体可以采用各种方法,基于步骤402中生成的结果提示词,生成用于提示用户搜索的候选提示词。例如,上述执行主体可以将所生成的结果提示词直接确定为候选提示词。
在本实施例的一些可选的实现方式中,上述执行主体可以基于所生成的结果提示词,通过以下步骤生成用于提示用户搜索的候选提示词:
首先,上述执行主体可以获取预设历史时间段内该目标标题文本所对应的历史搜索词。其中,该目标标题文本所对应的历史搜索词为在预设历史时间段内,用户在点击该目标标题文本之前输入的搜索词。
然后,对于所获得的历史搜索词中的历史搜索词,上述执行主体可以确定该历史搜索词与所生成的结果提示词的相似度,其中,相似度为用于表征历史搜索词与结果提示词之间的相似程度的数值。
最后,上述执行主体可以提取相似度大于等于预设阈值的历史搜索词作为用于提示用户搜索的候选提示词。
在该实现方式中,利用用户输入的历史搜索词来确定候选提示词可以提高候选提示词的语言流畅程度。
步骤404,从所生成的候选提示词中选取用于呈现给用户的目标提示词。
在本实施例中,基于步骤403中得到的候选提示词,上述执行主体可以从所生成的候选提示词中选取用于呈现给用户的目标提示词。
在这里,上述执行主体可以采用各种方法从所生成的候选提示词中选取用于呈现给用户的目标提示词。例如,采用随机选取的方法选取用于呈现给用户的目标提示词。
上述步骤401、步骤404分别与前述实施例中的步骤201、步骤203一致,上文针对步骤201和步骤203的描述也适用于步骤401和步骤403,此处不再赘述。
从图4中可以看出,与图2对应的实施例相比,本实施例中的用于处理信息的方法的流程400突出了利用提示词生成模型生成目标标题文本所对应的候选提示词的步骤。由此,本实施例提供了又一种生 成候选提示词的方案,提高了信息处理的多样性,且利用提示词生成模型生成候选提示词,可以提高信息处理的准确性。
进一步参考图5,作为对上述各图所示方法的实现,本申请提供了一种用于处理信息的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,本实施例的用于处理信息的装置500包括:获取单元501、生成单元502和选取单元503。其中,获取单元501被配置成获取目标标题文本集合,其中,目标标题文本对应正文信息,目标标题文本用于用户输入搜索词后点击,以向用户呈现所点击的目标标题文本所对应的正文信息;生成单元502被配置成对于目标标题文本集合中的目标标题文本,基于该目标标题文本,生成用于提示用户搜索的候选提示词;选取单元503被配置成从所生成的候选提示词中选取用于呈现给用户的目标提示词。
在本实施例中,用于处理信息的装置500的获取单元501可以通过有线连接方式或者无线连接方式从本地或者与之通信连接的电子设备(例如图1所示的终端设备)获取目标标题文本集合。其中,目标标题文本为用于对其进行处理以获得目标提示词的标题文本。目标提示词为用于提示用户搜索的词汇、短语或者句子。目标标题文本对应正文信息,目标标题文本用于用户输入搜索词后点击,以向用户呈现所点击的目标标题文本所对应的正文信息。目标标题文本用于描述所对应的正文信息的内容。搜索词为用户输入的、用于搜索的词汇、短语或者句子。
在本实施例中,对于获取单元501得到的目标标题文本集合中的目标标题文本,生成单元502可以基于该目标标题文本,采用各种方法生成用于提示用户搜索的候选提示词。其中,候选提示词可以用于生成目标提示词,可以为词汇、短语或者句子,例如为短语“今日天气”。
在本实施例中,基于生成单元502得到的候选提示词,选取单元503可以从所生成的候选提示词中选取用于呈现给用户的目标提示词。
在这里,选取单元503可以采用各种方法从所生成的候选提示词中选取用于呈现给用户的目标提示词。例如,采用随机选取的方法选取用于呈现给用户的目标提示词。
在本实施例的一些可选的实现方式中,生成单元502可以包括:第一生成模块(图中未示出),被配置成将该目标标题文本输入预先训练的提示词生成模型,生成结果提示词;第二生成模块(图中未示出),被配置成基于所生成的结果提示词,生成用于提示用户搜索的候选提示词。
在本实施例的一些可选的实现方式中,生成单元502可以包括:获取模块(图中未示出),被配置成获取预设历史时间段内该目标标题文本所对应的历史搜索词;第一确定模块(图中未示出),被配置成对于所获得的历史搜索词中的历史搜索词,确定该历史搜索词与所生成的结果提示词的相似度,其中,相似度为用于表征历史搜索词与结果提示词之间的相似程度的数值;提取模块(图中未示出),被配置成提取相似度大于等于预设阈值的历史搜索词作为用于提示用户搜索的候选提示词。
在本实施例的一些可选的实现方式中,生成单元502可以包括:分词模块(图中未示出),被配置成对该目标标题文本进行分词,获得分词结果;第三生成模块(图中未示出),被配置成基于所获得的分词结果,生成用于提示用户搜索的候选提示词。
在本实施例的一些可选的实现方式中,第三生成模块可以进一步被配置成:对于所获得的分词结果中的词汇,确定该词汇的词性;基于所获得的分词结果和所确定的词性,生成用于提示用户搜索的候选提示词。
在本实施例的一些可选的实现方式中,第三生成模块可以进一步被配置成:对于所获得的分词结果中的词汇,确定在所获得的分词结果中,该词汇的重要度,其中,重要度为用于表征词汇的重要程度的数值;基于所获得的分词结果和所确定的重要度,生成用于提示用户搜索的候选提示词。
在本实施例的一些可选的实现方式中,生成单元502可以包括: 第四生成模块(图中未示出),被配置成基于该目标标题文本,生成用于提示用户搜索的初始候选提示词;过滤模块(图中未示出),被配置成对所生成的初始候选提示词进行过滤,以去除初始候选提示词中符合预设条件的词汇;第二确定模块(图中未示出),被配置成将过滤后的初始候选提示词确定为候选搜索词。
在本实施例的一些可选的实现方式中,选取单元503可以包括:排序模块(图中未示出),被配置成对所生成的候选提示词进行排序,获得候选提示词序列;选取模块(图中未示出),被配置成从所获得的候选提示词序列中选取用于呈现给用户的目标提示词。
在本实施例的一些可选的实现方式中,排序模块可以进一步被配置成:对于所生成的候选提示词中的候选提示词,执行以下评分步骤:确定该候选提示词与该候选提示词所对应的目标标题文本的相关度,其中,相关度为用于表征候选提示词与目标标题文本的相关程度的数值;基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值;基于所确定的分值,对所获得的候选提示词进行排序,获得候选提示词序列。
在本实施例的一些可选的实现方式中,在基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值之前,评分步骤还可以包括:确定该候选提示词的语言流畅度,其中,语言流畅度为用于表征候选提示词的语言流畅程度的数值;以及基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值,包括:基于所确定的相关度和语言流畅度,确定用于表征该候选提示词的优劣程度的分值。
可以理解的是,该装置500中记载的诸单元与参考图2描述的方法中的各个步骤相对应。由此,上文针对方法描述的操作、特征以及产生的有益效果同样适用于装置500及其中包含的单元,在此不再赘述。
本申请的上述实施例提供的装置500有效利用目标标题文本集合生成了用于呈现给用户的目标提示词,以此,可以在用户输入搜索词进行搜索前,提示用户搜索目标提示词所指示的内容,丰富了信息搜索的方式,提高了信息处理的多样性。
下面参考图6,其示出了适于用来实现本申请实施例的电子设备(例如图1所示的终端设备或服务器)的计算机系统600的结构示意图。图6示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图6所示,计算机系统600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本申请的方法中限定的上述功能。需要说明的是,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的 例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括获取单元、生成单元和选取单元。 其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“获取目标标题文本集合的单元”。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取目标标题文本集合,其中,目标标题文本对应正文信息,目标标题文本用于用户输入搜索词后点击,以向用户呈现所点击的目标标题文本所对应的正文信息;对于目标标题文本集合中的目标标题文本,基于该目标标题文本,生成用于提示用户搜索的候选提示词;从所生成的候选提示词中选取用于呈现给用户的目标提示词。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (22)

  1. 一种用于处理信息的方法,包括:
    获取目标标题文本集合,其中,目标标题文本对应正文信息,目标标题文本用于用户输入搜索词后点击,以向用户呈现所点击的目标标题文本所对应的正文信息;
    对于所述目标标题文本集合中的目标标题文本,基于该目标标题文本,生成用于提示用户搜索的候选提示词;
    从所生成的候选提示词中选取用于呈现给用户的目标提示词。
  2. 根据权利要求1所述的方法,其中,所述基于该目标标题文本,生成用于提示用户搜索的候选提示词,包括:
    将该目标标题文本输入预先训练的提示词生成模型,生成结果提示词;
    基于所生成的结果提示词,生成用于提示用户搜索的候选提示词。
  3. 根据权利要求2所述的方法,其中,所述基于所生成的结果提示词,生成用于提示用户搜索的候选提示词,包括:
    获取预设历史时间段内该目标标题文本所对应的历史搜索词;
    对于所获得的历史搜索词中的历史搜索词,确定该历史搜索词与所生成的结果提示词的相似度,其中,相似度为用于表征历史搜索词与结果提示词之间的相似程度的数值;
    提取相似度大于等于预设阈值的历史搜索词作为用于提示用户搜索的候选提示词。
  4. 根据权利要求1所述的方法,其中,所述基于该目标标题文本,生成用于提示用户搜索的候选提示词,包括:
    对该目标标题文本进行分词,获得分词结果;
    基于所获得的分词结果,生成用于提示用户搜索的候选提示词。
  5. 根据权利要求4所述的方法,其中,所述基于所获得的分词结果,生成用于提示用户搜索的候选提示词,包括:
    对于所获得的分词结果中的词汇,确定该词汇的词性;
    基于所获得的分词结果和所确定的词性,生成用于提示用户搜索的候选提示词。
  6. 根据权利要求4所述的方法,其中,所述基于所获得的分词结果,生成用于提示用户搜索的候选提示词,包括:
    对于所获得的分词结果中的词汇,确定在所获得的分词结果中,该词汇的重要度,其中,重要度为用于表征词汇的重要程度的数值;
    基于所获得的分词结果和所确定的重要度,生成用于提示用户搜索的候选提示词。
  7. 根据权利要求1所述的方法,其中,所述基于该目标标题文本,生成用于提示用户搜索的候选提示词,包括:
    基于该目标标题文本,生成用于提示用户搜索的初始候选提示词;
    对所生成的初始候选提示词进行过滤,以去除初始候选提示词中符合预设条件的词汇;
    将过滤后的初始候选提示词确定为候选搜索词。
  8. 根据权利要求1-7之一所述的方法,其中,所述从所生成的候选提示词中选取用于呈现给用户的目标提示词,包括:
    对所生成的候选提示词进行排序,获得候选提示词序列;
    从所获得的候选提示词序列中选取用于呈现给用户的目标提示词。
  9. 根据权利要求8所述的方法,其中,所述对所生成的候选提示词进行排序,获得候选提示词序列,包括:
    对于所生成的候选提示词中的候选提示词,执行以下评分步骤:确定该候选提示词与该候选提示词所对应的目标标题文本的相关度,其中,相关度为用于表征候选提示词与目标标题文本的相关程度的数 值;基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值;
    基于所确定的分值,对所获得的候选提示词进行排序,获得候选提示词序列。
  10. 根据权利要求9所述的方法,其中,在所述基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值之前,所述评分步骤还包括:
    确定该候选提示词的语言流畅度,其中,语言流畅度为用于表征候选提示词的语言流畅程度的数值;以及
    所述基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值,包括:
    基于所确定的相关度和语言流畅度,确定用于表征该候选提示词的优劣程度的分值。
  11. 一种用于处理信息的装置,包括:
    获取单元,被配置成获取目标标题文本集合,其中,目标标题文本对应正文信息,目标标题文本用于用户输入搜索词后点击,以向用户呈现所点击的目标标题文本所对应的正文信息;
    生成单元,被配置成对于所述目标标题文本集合中的目标标题文本,基于该目标标题文本,生成用于提示用户搜索的候选提示词;
    选取单元,被配置成从所生成的候选提示词中选取用于呈现给用户的目标提示词。
  12. 根据权利要求11所述的装置,其中,所述生成单元包括:
    第一生成模块,被配置成将该目标标题文本输入预先训练的提示词生成模型,生成结果提示词;
    第二生成模块,被配置成基于所生成的结果提示词,生成用于提示用户搜索的候选提示词。
  13. 根据权利要求12所述的装置,其中,所述生成单元包括:
    获取模块,被配置成获取预设历史时间段内该目标标题文本所对应的历史搜索词;
    第一确定模块,被配置成对于所获得的历史搜索词中的历史搜索词,确定该历史搜索词与所生成的结果提示词的相似度,其中,相似度为用于表征历史搜索词与结果提示词之间的相似程度的数值;
    提取模块,被配置成提取相似度大于等于预设阈值的历史搜索词作为用于提示用户搜索的候选提示词。
  14. 根据权利要求11所述的装置,其中,所述生成单元包括:
    分词模块,被配置成对该目标标题文本进行分词,获得分词结果;
    第三生成模块,被配置成基于所获得的分词结果,生成用于提示用户搜索的候选提示词。
  15. 根据权利要求14所述的装置,其中,所述第三生成模块进一步被配置成:
    对于所获得的分词结果中的词汇,确定该词汇的词性;
    基于所获得的分词结果和所确定的词性,生成用于提示用户搜索的候选提示词。
  16. 根据权利要求14所述的装置,其中,所述第三生成模块进一步被配置成:
    对于所获得的分词结果中的词汇,确定在所获得的分词结果中,该词汇的重要度,其中,重要度为用于表征词汇的重要程度的数值;
    基于所获得的分词结果和所确定的重要度,生成用于提示用户搜索的候选提示词。
  17. 根据权利要求11所述的装置,其中,所述生成单元包括:
    第四生成模块,被配置成基于该目标标题文本,生成用于提示用户搜索的初始候选提示词;
    过滤模块,被配置成对所生成的初始候选提示词进行过滤,以去除初始候选提示词中符合预设条件的词汇;
    第二确定模块,被配置成将过滤后的初始候选提示词确定为候选搜索词。
  18. 根据权利要求11-17之一所述的装置,其中,所述选取单元包括:
    排序模块,被配置成对所生成的候选提示词进行排序,获得候选提示词序列;
    选取模块,被配置成从所获得的候选提示词序列中选取用于呈现给用户的目标提示词。
  19. 根据权利要求18所述的装置,其中,所述排序模块进一步被配置成:
    对于所生成的候选提示词中的候选提示词,执行以下评分步骤:确定该候选提示词与该候选提示词所对应的目标标题文本的相关度,其中,相关度为用于表征候选提示词与目标标题文本的相关程度的数值;基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值;
    基于所确定的分值,对所获得的候选提示词进行排序,获得候选提示词序列。
  20. 根据权利要求19所述的装置,其中,在所述基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值之前,所述评分步骤还包括:
    确定该候选提示词的语言流畅度,其中,语言流畅度为用于表征候选提示词的语言流畅程度的数值;以及
    所述基于所确定的相关度,确定用于表征该候选提示词的优劣程度的分值,包括:
    基于所确定的相关度和语言流畅度,确定用于表征该候选提示词 的优劣程度的分值。
  21. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,其上存储有一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-10中任一所述的方法。
  22. 一种计算机可读介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-10中任一所述的方法。
PCT/CN2018/115954 2018-09-14 2018-11-16 用于处理信息的方法和装置 WO2020052061A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811075460.5 2018-09-14
CN201811075460.5A CN109325178A (zh) 2018-09-14 2018-09-14 用于处理信息的方法和装置

Publications (1)

Publication Number Publication Date
WO2020052061A1 true WO2020052061A1 (zh) 2020-03-19

Family

ID=65265345

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/115954 WO2020052061A1 (zh) 2018-09-14 2018-11-16 用于处理信息的方法和装置

Country Status (2)

Country Link
CN (1) CN109325178A (zh)
WO (1) WO2020052061A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579875A (zh) * 2019-09-29 2021-03-30 百度在线网络技术(北京)有限公司 投放信息标题的生成方法、装置、设备和介质
CN111339399A (zh) * 2020-01-20 2020-06-26 腾讯科技(深圳)有限公司 目标处理方法、目标处理装置、目标处理设备及介质
CN111783395B (zh) * 2020-04-17 2023-12-08 北京沃东天骏信息技术有限公司 用于输出文本的方法和装置
CN112434127B (zh) * 2020-11-03 2023-10-17 咪咕文化科技有限公司 文本信息搜索方法、设备及可读存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106970910A (zh) * 2017-03-31 2017-07-21 北京奇艺世纪科技有限公司 一种基于图模型的关键词提取方法及装置
CN107220386A (zh) * 2017-06-29 2017-09-29 北京百度网讯科技有限公司 信息推送方法和装置
CN107544982A (zh) * 2016-06-24 2018-01-05 中兴通讯股份有限公司 文本信息处理方法、装置及终端

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929925A (zh) * 2012-09-20 2013-02-13 百度在线网络技术(北京)有限公司 一种基于浏览内容的搜索方法及装置
CN105095440B (zh) * 2015-07-23 2019-02-12 百度在线网络技术(北京)有限公司 一种搜索推荐方法及装置
CN108241667B (zh) * 2016-12-26 2019-10-15 百度在线网络技术(北京)有限公司 用于推送信息的方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107544982A (zh) * 2016-06-24 2018-01-05 中兴通讯股份有限公司 文本信息处理方法、装置及终端
CN106970910A (zh) * 2017-03-31 2017-07-21 北京奇艺世纪科技有限公司 一种基于图模型的关键词提取方法及装置
CN107220386A (zh) * 2017-06-29 2017-09-29 北京百度网讯科技有限公司 信息推送方法和装置

Also Published As

Publication number Publication date
CN109325178A (zh) 2019-02-12

Similar Documents

Publication Publication Date Title
JP7122341B2 (ja) 翻訳品質を評価するための方法と装置
TWI732271B (zh) 人機對話方法、裝置、電子設備及電腦可讀媒體
CN107491534B (zh) 信息处理方法和装置
JP7421604B2 (ja) モデル事前訓練方法および装置、テキスト生成方法および装置、電子機器、記憶媒体並びにコンピュータプログラム
CN107193792B (zh) 基于人工智能的生成文章的方法和装置
US10176804B2 (en) Analyzing textual data
US20190163691A1 (en) Intent Based Dynamic Generation of Personalized Content from Dynamic Sources
CN111428010B (zh) 人机智能问答的方法和装置
JP7301922B2 (ja) 意味検索方法、装置、電子機器、記憶媒体およびコンピュータプログラム
WO2020052061A1 (zh) 用于处理信息的方法和装置
WO2020052069A1 (zh) 用于分词的方法和装置
CN109543058B (zh) 用于检测图像的方法、电子设备和计算机可读介质
WO2018045646A1 (zh) 基于人工智能的人机交互方法和装置
US20180068221A1 (en) System and Method of Advising Human Verification of Machine-Annotated Ground Truth - High Entropy Focus
WO2016092406A1 (en) Inferred facts discovered through knowledge graph derived contextual overlays
CN109582825B (zh) 用于生成信息的方法和装置
CN109766418B (zh) 用于输出信息的方法和装置
US11238050B2 (en) Method and apparatus for determining response for user input data, and medium
US10915756B2 (en) Method and apparatus for determining (raw) video materials for news
US20190122667A1 (en) Question Urgency in QA System with Visual Representation in Three Dimensional Space
CN110647613A (zh) 一种课件构建方法、装置、服务器和存储介质
US9747891B1 (en) Name pronunciation recommendation
CN112182255A (zh) 用于存储媒体文件和用于检索媒体文件的方法和装置
CN111078849A (zh) 用于输出信息的方法和装置
WO2020052060A1 (zh) 用于生成修正语句的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18933304

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 24/06/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18933304

Country of ref document: EP

Kind code of ref document: A1