CN110580276B - Method and apparatus for processing information - Google Patents

Method and apparatus for processing information Download PDF

Info

Publication number
CN110580276B
CN110580276B CN201810585420.9A CN201810585420A CN110580276B CN 110580276 B CN110580276 B CN 110580276B CN 201810585420 A CN201810585420 A CN 201810585420A CN 110580276 B CN110580276 B CN 110580276B
Authority
CN
China
Prior art keywords
matched
target
vocabulary
index
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810585420.9A
Other languages
Chinese (zh)
Other versions
CN110580276A (en
Inventor
吴石磊
王斐
彭锋
杨维
孙敏琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Original Assignee
Baidu Online Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu Online Network Technology Beijing Co Ltd filed Critical Baidu Online Network Technology Beijing Co Ltd
Priority to CN201810585420.9A priority Critical patent/CN110580276B/en
Publication of CN110580276A publication Critical patent/CN110580276A/en
Application granted granted Critical
Publication of CN110580276B publication Critical patent/CN110580276B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The embodiment of the application discloses a method and a device for processing information. One embodiment of the method comprises: acquiring a search word input by a user and a preset vocabulary set to be matched, wherein an index set corresponding to the vocabulary to be matched is preset for the vocabulary to be matched in the vocabulary set to be matched, and the index set comprises a character index; performing word segmentation processing on the acquired search words to acquire a target text set; and matching the target text set and the index set based on the character indexes in the index set to determine a target vocabulary to be matched, wherein the index set corresponding to the target vocabulary to be matched comprises the character indexes matched with the target text in the target text set. The embodiment improves the diversity and flexibility of information processing.

Description

Method and apparatus for processing information
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method and a device for processing information.
Background
Generally, word segmentation refers to Chinese word segmentation. The word segmentation in Chinese is also called Chinese word segmentation. By word segmentation, a Chinese character sequence can be segmented into one or more words.
Word segmentation is the basis for text mining. By word segmentation, the computer can automatically recognize the meaning of the sentence. The method for automatically identifying the meaning of the sentence by the computer through word segmentation is also called a mechanical word segmentation method, and the main principle of the method is to match a Chinese word string to be analyzed with a vocabulary entry in a preset machine dictionary according to a certain strategy so as to determine a target vocabulary entry corresponding to the Chinese word string to be analyzed.
Disclosure of Invention
The embodiment of the application provides a method and a device for processing information.
In a first aspect, an embodiment of the present application provides a method for processing information, where the method includes: acquiring a search word input by a user and a preset vocabulary set to be matched, wherein an index set corresponding to the vocabulary to be matched is preset for the vocabulary to be matched in the vocabulary set to be matched, and the index set comprises a character index; performing word segmentation processing on the acquired search words to acquire a target text set; and matching the target text set and the index set based on the character indexes in the index set to determine a target vocabulary to be matched, wherein the index set corresponding to the target vocabulary to be matched comprises the character indexes matched with the target text in the target text set.
In some embodiments, the target text in the target text set is a target word; and matching the target text set with the index set based on the word indexes in the index set, wherein the matching comprises the following steps: and matching the target words in the target text set with the word indexes in the index set.
In some embodiments, the index set further includes a vocabulary index, and the target texts in the target text set are target vocabularies; and matching the target text set and the index set based on the word indexes in the index set, including: matching the target words in the target text set with the word indexes in the index set; in response to the fact that the target text set comprises the target words which are not successfully matched, word segmentation processing is carried out on the target words which are not successfully matched, and target characters are obtained; and matching the obtained target words with word indexes in an index set corresponding to the target words successfully matched in the target text set.
In some embodiments, the index set corresponding to the vocabulary to be matched is obtained by the following steps: performing word segmentation on the vocabulary to be matched to obtain a processing result comprising the vocabulary; for the vocabulary in the obtained processing result, obtaining a candidate vocabulary of the vocabulary, wherein the candidate vocabulary comprises but is not limited to at least one of the following items: synonyms, near synonyms; and generating an index set corresponding to the vocabulary to be matched based on the processing result corresponding to the vocabulary to be matched and the candidate vocabulary.
In some embodiments, for the vocabulary to be matched in the vocabulary set to be matched, a search result corresponding to the vocabulary to be matched is preset; and matching the target text set with the index set based on the character indexes in the index set to determine a target vocabulary to be matched, wherein the method further comprises the following steps: and determining a search result corresponding to the target vocabulary to be matched as a target search result and outputting the target search result.
In some embodiments, matching the target text set and the index set corresponding to the vocabulary to be matched in the vocabulary set to be matched based on the word index in the index set to determine the target vocabulary to be matched includes: selecting a target text from the target text set as a target text to be matched, and executing the following determination steps based on the target text to be matched and the vocabulary set to be matched: matching the target text to be matched with the indexes in the index set corresponding to the words to be matched in the word set to be matched so as to determine a target index; determining whether the target text set comprises unselected target texts; in response to the fact that the target text set does not comprise unselected target texts, determining the vocabulary to be matched corresponding to the index set comprising the target index as the vocabulary to be matched; and in response to the fact that the target text set comprises unselected target texts, selecting the target texts from the unselected target texts as target texts to be matched, generating a new vocabulary set to be matched based on vocabularies to be matched corresponding to the index set comprising the target indexes, and continuing to execute the determining step based on the target texts to be matched which are selected last time and the vocabulary set to be matched which is generated last time.
In a second aspect, an embodiment of the present application provides an apparatus for processing information, where the apparatus includes: the device comprises an acquisition unit, a matching unit and a matching unit, wherein the acquisition unit is configured to acquire a search word input by a user and a preset vocabulary set to be matched, and for the vocabulary to be matched in the vocabulary set to be matched, an index set corresponding to the vocabulary to be matched is preset and comprises a character index; the word cutting unit is configured to perform word cutting processing on the acquired search words to obtain a target text set; and the matching unit is configured to match the target text set and the index set based on the character indexes in the index set so as to determine a target vocabulary to be matched, wherein the index set corresponding to the target vocabulary to be matched comprises the character indexes matched with the target text in the target text set.
In some embodiments, the target text in the target text set is a target word; and the matching unit is further configured to: and matching the target words in the target text set with the word indexes in the index set.
In some embodiments, the index set further includes a vocabulary index, and the target texts in the target text set are target vocabularies; and the matching unit includes: the vocabulary matching module is configured to match target vocabularies in the target text set with vocabulary indexes in the index set; the word segmentation processing module is configured to respond to the fact that the target text set comprises the target words which are not successfully matched, and conduct word segmentation processing on the target words which are not successfully matched to obtain target characters; and the character matching module is configured to match the obtained target characters with character indexes in an index set corresponding to the target words successfully matched in the target text set.
In some embodiments, the index set corresponding to the vocabulary to be matched is obtained through the following steps: performing word segmentation on the vocabulary to be matched to obtain a processing result comprising the vocabulary; for the vocabulary in the obtained processing result, acquiring a candidate vocabulary of the vocabulary, wherein the candidate vocabulary comprises but is not limited to at least one of the following words: synonyms, synonyms; and generating an index set corresponding to the vocabulary to be matched based on the processing result corresponding to the vocabulary to be matched and the candidate vocabulary.
In some embodiments, for the vocabulary to be matched in the vocabulary set to be matched, a search result corresponding to the vocabulary to be matched is preset; and the apparatus further comprises: and the determining unit is configured to determine a search result corresponding to the target vocabulary to be matched as a target search result and output the target search result.
In some embodiments, the matching unit further comprises: the first execution module is configured to select a target text from the target text set as a target text to be matched, and based on the target text to be matched and the vocabulary set to be matched, execute the following determination steps: matching the target text to be matched with the indexes in the index set corresponding to the words to be matched in the word set to be matched so as to determine a target index; determining whether the target text set comprises unselected target texts; in response to the fact that the target text set does not comprise unselected target texts, determining the vocabulary to be matched corresponding to the index set comprising the target index as the vocabulary to be matched; and the second execution module is configured to respond to the fact that the target text set comprises unselected target texts, select the target texts from the unselected target texts as target texts to be matched, generate a new vocabulary set to be matched based on vocabularies to be matched corresponding to the index set comprising the target indexes, and continue to execute the determination step based on the target texts to be matched which are selected last time and the vocabulary set to be matched which is generated last time.
In a third aspect, an embodiment of the present application provides a server, including: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement the method of any of the embodiments of the method for processing information described above.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the method of any of the above-described methods for processing information.
According to the method and the device for processing the information, the search terms input by the user and the preset vocabulary set to be matched are obtained, wherein the index set corresponding to the vocabulary to be matched in the vocabulary set to be matched is preset, and comprises the character indexes; performing word segmentation processing on the acquired search words to acquire a target text set; the target text set and the index set are matched based on the character indexes in the index set to determine the target vocabulary to be matched, wherein the index set corresponding to the target vocabulary to be matched comprises the character indexes matched with the target text in the target text set, so that the search words and the vocabulary to be matched can be matched by utilizing the character indexes preset aiming at the vocabulary to be matched, the diversity of information processing is improved, in addition, under the condition that the search words input by a user are short words of a certain word or a word group, the target vocabulary to be matched can be matched based on the character indexes, and the flexibility of information processing is improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram for one embodiment of a method for processing information according to the present application;
FIG. 3 is a schematic diagram of an application scenario of a method for processing information according to the present application;
FIG. 4 is a flow diagram of yet another embodiment of a method for processing information according to the present application;
FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for processing information according to the present application;
FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method for processing information or the apparatus for processing information of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. Network 104 is the medium used to provide communication links between terminal devices 101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may use terminal devices 101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. Various communication client applications, such as a web browser application, a map-type application, a search-type application, an instant messaging tool, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting information search, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services, such as an information processing server that processes search words transmitted by the terminal apparatuses 101, 102, 103. The information processing server may perform processing such as matching on the received data such as the search word, and obtain a processing result (e.g., a target vocabulary to be matched).
It should be noted that the method for processing information provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for processing information is generally disposed in the server 105.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for processing information in accordance with the present application is shown. The method for processing information comprises the following steps:
step 201, obtaining a search word input by a user and a preset vocabulary set to be matched.
In this embodiment, an execution subject (for example, a server shown in fig. 1) of the method for processing information may acquire a search word input by a user and a preset vocabulary set to be matched in a wired connection manner or a wireless connection manner. The search word may be a single word, phrase, or sentence, etc. input by the user for searching. For example, the search term may be "mountain", "highest mountain", or "where highest mountain".
In this embodiment, the vocabulary set to be matched may be a vocabulary set preset by a technician and used for matching with the search term input by the user. It should be noted that, here, the words to be matched may be single words, phrases, or sentences. In addition, for the vocabulary to be matched in the vocabulary set to be matched, an index set corresponding to the vocabulary to be matched is preset. The index may be information preset by a technician for the vocabulary to be matched and used for searching the vocabulary to be matched. Here, the index set may include a literal index. The word index may be a single word. As an example, the vocabulary to be matched is "world highest mountain", and the index set of the vocabulary to be matched may include the following six word indexes: "Shi"; "boundary"; "the most"; "high"; "mountain"; pulse. If the vocabulary to be matched is 'haigou', the index set of the vocabulary to be matched may include the following two word indexes: "sea"; the "groove".
It should be noted that the execution main body may acquire a search word input by a user and a preset vocabulary set to be matched, which are sent by a terminal (for example, the terminal device shown in fig. 1) connected in communication therewith, or may also acquire the preset vocabulary set to be matched from a local place.
Step 202, performing word segmentation processing on the obtained search words to obtain a target text set.
In this embodiment, the executing body may perform word segmentation processing on the acquired search word to obtain a target text set. The target texts in the target text set can be target words or target vocabularies. The target word may be a single word. The target vocabulary may be words or phrases. As an example, the search word is "highest mountain", and the target text set obtained after the word segmentation process may include the target words "highest", "high", "mountain", or include the target words "highest", "mountain".
It is understood that whether the target word or the target vocabulary is obtained after the word segmentation process can be controlled by the word segmentation granularity. The particle size of the word-cutting can be preset by the skilled person.
And step 203, matching the target text set and the index set based on the character indexes in the index set to determine the target vocabulary to be matched.
In this embodiment, the executing body may match the target text set and the index set based on the word index in the index set to determine the target vocabulary to be matched. The index set corresponding to the target vocabulary to be matched may include a word index matched with the target text in the target text set. The matched word index may be a word index which is used for matching with the target text and is successfully matched. It should be noted that, when the target text set and the index set are matched based on the word indexes in the index set, the target vocabulary to be matched can be determined according to the fact that the indexes in the index set corresponding to the target vocabulary to be matched are successfully matched with the target text part in the target text set; and determining the target vocabulary to be matched according to the successful matching of indexes in the index set corresponding to the target vocabulary to be matched and the target texts in the target text set.
In this embodiment, the executing body may directly match the target text in the target text set with the character indexes in the index set, or may process the target text and match the processed target text with the character indexes in the index set.
In some optional implementations of this embodiment, the target text in the target text set may be a target word; and the execution main body can directly match the target characters in the target text set with the character indexes in the index set, so as to determine the target vocabulary to be matched.
As an example, the target text in the target text set is the target words "most", "high", "mountain", "pulse". The vocabulary set to be matched comprises a vocabulary to be matched, namely 'world highest mountain' and a vocabulary to be matched, namely 'sea ditch'. For the word "world highest mountain", the corresponding index set includes the word index "world", "most", "high", "mountain" and "pulse", the execution main body matches the target word "most", "high", "mountain" and "pulse" with the word index "world", "most", "high", "mountain" and "pulse", to obtain the index set including the word index matching the target word "most", "high", "mountain" and "pulse", and further, the execution main body can determine the word "world highest mountain" corresponding to the index set as the target word to be matched; for the word to be matched, "the sea ditch" is matched with the word index "sea" or "ditch", and the execution main body matches the target word "the most", "high", "mountain" or "pulse" with the word index "sea" or "ditch", so that the word to be matched "sea ditch" is not the target word to be matched. It should be noted that, this implementation mode can determine the target vocabulary to be matched more conveniently by directly matching the target characters and the character indexes, and in addition, matching is performed based on the character indexes, so that the matching accuracy can be improved.
Optionally, the target text in the target text set may be a target word, and the execution main body may match the target word in the target text set with a word index in the index set. Specifically, for each target word in the target text set, the execution main body may perform word segmentation on the target word to obtain a target word corresponding to the target word, then match the obtained target word with a word index in the index set to determine a word index matching the obtained target word, and further determine the determined word index as the word index matching the target word. In the implementation mode, another scheme for matching the target text set and the index set is provided, so that the diversity of information processing is improved, and the matching accuracy can be improved by matching based on the character indexes in the index set.
In some optional implementation manners of this embodiment, the index set corresponding to the vocabulary to be matched may be obtained by the execution main body or other electronic device through the following steps:
firstly, word segmentation processing is carried out on the vocabulary to be matched, and a processing result comprising the vocabulary is obtained.
Here, the words included in the processing result may be words or phrases. The processing result may also include a single word.
Then, for the vocabulary in the obtained processing result, the candidate vocabulary of the vocabulary is obtained.
Wherein the candidate vocabulary may include, but is not limited to, at least one of the following: synonyms, synonyms. Here, the electronic device (the execution subject or the other electronic device) for generating the index set may acquire a candidate word of the word transmitted by another electronic device in communication connection therewith, or may acquire a candidate word of the word by referring to a correspondence table between a word and a candidate word established in advance.
And finally, generating an index set corresponding to the vocabulary to be matched based on the processing result corresponding to the vocabulary to be matched and the candidate vocabulary.
Specifically, the processing result and the candidate vocabulary can be used as indexes in an index set, so as to generate an index set; alternatively, a preset number of candidate vocabularies may be extracted from the acquired candidate vocabularies, and the processing result and the extracted candidate vocabularies may be used as indexes in the index set to generate the index set.
In some optional implementation manners of this embodiment, the executing main body may further match an index set corresponding to a vocabulary to be matched in the target text set and the vocabulary set to be matched, so as to determine the target vocabulary to be matched, by the following steps:
Firstly, the executing body may select a target text from a target text set as a target text to be matched, and based on the target text to be matched and a vocabulary set to be matched, execute the following determining steps: matching the target text to be matched with indexes in an index set corresponding to the words to be matched in the word set to be matched to determine a target index, wherein the target index is an index which is successfully matched; determining whether the target text set comprises unselected target texts; and in response to the fact that the target text set does not comprise the unselected target text, determining the vocabulary to be matched corresponding to the index set comprising the target index as the target vocabulary to be matched.
Secondly, the executing body may further select a target text from the unselected target texts as a target text to be matched in response to determining that the target text set includes the unselected target text, generate a new vocabulary set to be matched based on the vocabulary to be matched corresponding to the index set including the target index, and continue to execute the determining step based on the target text to be matched which is selected last time and the vocabulary set to be matched which is generated last time.
It should be noted that, here, the executing body may select a target text from the target text set or unselected target texts as a target text to be matched in a manner predetermined by a technician, for example, may select a target text as a target text to be matched in a manner of random selection.
With continuing reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for processing information according to the present embodiment. In the application scenario of fig. 3, a user 301 first enters a search term ("info kogao") 303 on a terminal 302. Then, the server 304 obtains the search terms 303 sent by the terminal 302 and input by the user 301, and obtains a preset vocabulary set to be matched ("university of information technology"; "technical college") 305, wherein, for the vocabulary to be matched in the vocabulary set 305, "university of information technology", an index set corresponding to the vocabulary to be matched is preset, and the index set includes word indexes "information", "science", "skill", "large", "school"; for the vocabulary to be matched "technical college" in the vocabulary set 305 to be matched, an index set corresponding to the vocabulary to be matched is preset, and the index set includes the word indexes "technical", "academic" and "college". The server 304 may then perform word segmentation on the retrieved search terms to obtain a set of target text (e.g., "letter"; "message"; "family"; "big") 306. Then, based on the word indexes in the two index sets, the server 304 may match the target text set 306 and the index set corresponding to the word "university of information technology" to be matched, and match the target text set 306 and the index set corresponding to the word "college of technology" to be matched to determine the target word to be matched (i.e., "university of information technology") 307, where the index set corresponding to the target word to be matched ("university of information technology") includes the word indexes that match the target texts (information; "information"; "department;" big ") in the target text set.
In addition, for the condition that the search word input by the user is a word or a phrase for short, the target vocabulary to be matched can be matched on the basis of the character index, and the flexibility of information processing is improved.
With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for processing information is illustrated. The flow 400 of the method for processing information comprises the steps of:
step 401, obtaining a search term input by a user and a preset vocabulary set to be matched.
In this embodiment, an execution subject (for example, a server shown in fig. 1) of the method for processing information may acquire a search word input by a user and a preset collection of words to be matched in a wired connection manner or a wireless connection manner. The search word may be a single word, phrase, or sentence, etc. input by the user for searching.
In this embodiment, the vocabulary set to be matched may be a vocabulary set preset by a technician and used for matching with the search term input by the user. It should be noted that, here, the words to be matched may be single words, phrases, or sentences. In addition, for the vocabulary to be matched in the vocabulary set to be matched, an index set corresponding to the vocabulary to be matched is preset. The index may be information preset by a technician for the vocabulary to be matched and used for searching the vocabulary to be matched. Here, the index set may include a literal index. The word index may be a single word.
Step 402, performing word segmentation processing on the obtained search word to obtain a target text set.
In this embodiment, the execution main body may perform word segmentation processing on the acquired search word to obtain a target text set. The target texts in the target text set can be target words or target vocabularies. The target word may be a single word. The target words may be words or phrases.
The steps 401 and 402 are implemented in a manner similar to the steps 201 and 202 in the foregoing embodiment, respectively. Accordingly, the above description regarding step 201 and step 202 also applies to step 401 and step 402 of this embodiment, and is not repeated here.
And step 403, matching the target words in the target text set with the word indexes in the index set.
In this embodiment, the index set may include a vocabulary index, the target text in the target text set may be a target vocabulary, and the vocabulary index may be a word or a phrase; and the execution subject can match the target words in the target text set with the word indexes in the index set.
It should be noted that, through the matching between the target vocabulary and the vocabulary index, the execution subject may determine the target vocabulary that is successfully matched in the target text set and the index set corresponding to the target vocabulary that is successfully matched.
And step 404, in response to the fact that the target text set comprises the target words which are not successfully matched, performing word segmentation on the target words which are not successfully matched to obtain the target characters.
And 405, matching the obtained target words with word indexes in an index set corresponding to target words which are successfully matched in the target text set to determine the target words to be matched.
Here, it is understood that, when the word index in the index set is successfully matched with the obtained target word, it may be determined that the index set includes a word index matched with an unmatched target word corresponding to the obtained target word and also includes a word index matched with a matched target word, and at this time, the word to be matched corresponding to the index set may be determined as a target word to be matched.
In some optional implementation manners of this embodiment, a search result corresponding to a word to be matched in the word set to be matched is preset, and after a target word to be matched is determined, the execution main body may determine the search result corresponding to the target word to be matched as a target search result and output the target search result. Wherein the search results may include, but are not limited to, at least one of: text, numbers, symbols, audio, video, pictures, web pages. The target search result may be a search result corresponding to a search term input by the user.
By the implementation mode, the search result corresponding to the search word input by the user can be obtained, and the obtained search result is fed back to the user, so that more comprehensive information processing can be realized.
Optionally, when the execution main body determines at least two vocabularies to be matched, the execution main body may also sort the at least two vocabularies to be matched according to a predetermined sorting rule to obtain a target vocabulary sequence to be matched, and further, for the determined target search result, the execution main body may sort and output the target search result according to the order of the target vocabularies to be matched in the target vocabulary sequence to be matched. By sequencing the target words to be matched, the ordered output of the target search results corresponding to the target words to be matched can be realized.
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for processing information in this embodiment highlights the steps of first matching the target words in the target text set and the word indexes in the index set, performing word segmentation on the target words in response to determining that the target text set includes the target words that are not successfully matched, obtaining the target words, and then matching the target words and the word indexes to determine the target words to be matched. Therefore, the scheme described in this embodiment may perform vocabulary matching by using a conventional vocabulary matching method based on the vocabulary index, and perform matching on the vocabulary which is not successfully matched based on the word index when the target text set includes the vocabulary which is not successfully matched, thereby implementing flexible information processing.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for processing information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.
As shown in fig. 5, the apparatus 500 for processing information of the present embodiment includes: an acquisition unit 501, a word segmentation unit 502 and a matching unit 503. The obtaining unit 501 is configured to obtain a search word input by a user and a preset vocabulary set to be matched, where for a vocabulary to be matched in the vocabulary set to be matched, an index set corresponding to the vocabulary to be matched is preset, and the index set includes a text index; the word segmentation unit 502 is configured to perform word segmentation processing on the obtained search words to obtain a target text set; the matching unit 503 is configured to match the target text set and the index set based on the word indexes in the index set to determine a target word to be matched, where the index set corresponding to the target word to be matched includes the word indexes matched with the target text in the target text set.
In this embodiment, the obtaining unit 501 of the apparatus 500 for processing information may obtain the search term input by the user and the preset vocabulary set to be matched through a wired connection manner or a wireless connection manner. The search word may be a single word, phrase, or sentence, etc. input by the user for searching.
In this embodiment, the vocabulary set to be matched may be a vocabulary set preset by a technician and used for matching with the search term input by the user. It should be noted that, here, the words to be matched may be single words, phrases, or sentences. In addition, for the vocabulary to be matched in the vocabulary set to be matched, an index set corresponding to the vocabulary to be matched is preset. The index may be information preset by a technician for the vocabulary to be matched and used for searching the vocabulary to be matched. Here, the index set may include a literal index. The word index may be a single word.
In this embodiment, the word segmentation unit 502 may perform word segmentation on the obtained search word to obtain a target text set. The target texts in the target text set can be target words or target vocabularies. The target word may be a single word. The target vocabulary may be words or phrases.
In this embodiment, based on the word indexes in the index set, the matching unit 503 may match the target text set and the index set to determine the target vocabulary to be matched. The index set corresponding to the target vocabulary to be matched may include a word index matched with the target text in the target text set.
In some optional implementations of this embodiment, the target text in the target text set may be a target word; and the matching unit 503 may be further configured to: and matching the target words in the target text set with the word indexes in the index set.
In some optional implementation manners of this embodiment, the index set may further include a vocabulary index, and the target text in the target text set may be a target vocabulary; and the matching unit 503 may include: the vocabulary matching module is configured to match target vocabularies in the target text set with vocabulary indexes in the index set; the word segmentation processing module is configured to respond to the fact that the target text set comprises the target words which are not successfully matched, and conduct word segmentation processing on the target words which are not successfully matched to obtain target characters; and the character matching module is configured to match the obtained target characters with character indexes in an index set corresponding to the target words successfully matched in the target text set.
In some optional implementation manners of this embodiment, the index set corresponding to the vocabulary to be matched may be obtained through the following steps: performing word segmentation on the vocabulary to be matched to obtain a processing result comprising the vocabulary; for the vocabulary in the obtained processing result, obtaining a candidate vocabulary of the vocabulary, wherein the candidate vocabulary comprises but is not limited to at least one of the following items: synonyms, near synonyms; and generating an index set corresponding to the vocabulary to be matched based on the processing result corresponding to the vocabulary to be matched and the candidate vocabulary.
In some optional implementation manners of the embodiment, for a vocabulary to be matched in a vocabulary set to be matched, a search result corresponding to the vocabulary to be matched is preset; and the apparatus 500 may further comprise: and the determining unit (not shown in the figure) is configured to determine the search result corresponding to the target vocabulary to be matched as the target search result and output the target search result.
In some optional implementations of this embodiment, the matching unit 503 may further include: the first execution module is configured to select a target text from the target text set as a target text to be matched, and based on the target text to be matched and the vocabulary set to be matched, execute the following determination steps: matching the target text to be matched with the indexes in the index set corresponding to the words to be matched in the word set to be matched so as to determine a target index; determining whether the target text set comprises unselected target texts; in response to the fact that the target text set does not comprise unselected target texts, determining the vocabulary to be matched corresponding to the index set comprising the target index as the vocabulary to be matched; and the second execution module is configured to respond to the fact that the target text set comprises unselected target texts, select the target texts from the unselected target texts as target texts to be matched, generate a new vocabulary set to be matched based on vocabularies to be matched corresponding to the index set comprising the target indexes, and continue to execute the determination step based on the target texts to be matched which are selected last time and the vocabulary set to be matched which is generated last time.
The apparatus 500 provided in the foregoing embodiment of the present application obtains the search word input by the user and the preset vocabulary set to be matched through the obtaining unit 501, then the word segmentation unit 502 performs word segmentation processing on the obtained search word to obtain the target text set, and then the matching unit 503 matches the target text set and the index set based on the word index in the index set to determine the target vocabulary to be matched, so that matching between the search word and the vocabulary to be matched can be performed by using the word index preset for the vocabulary to be matched, which improves diversity of information processing.
Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use to implement a server according to embodiments of the present application is shown. The server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the use range of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. A driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a word segmentation unit, and a matching unit. The names of the units do not form a limitation on the units themselves under certain conditions, for example, the acquiring unit may also be described as a unit for acquiring the search words input by the user and the preset vocabulary set to be matched.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the server described in the above embodiments; or may exist separately and not be assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: the method comprises the steps of obtaining a search word input by a user and a preset vocabulary set to be matched, wherein an index set corresponding to the vocabulary to be matched is preset for the vocabulary to be matched in the vocabulary set to be matched, and the index set comprises character indexes; performing word segmentation processing on the acquired search word to acquire a target text set; and matching the target text set with the index set based on the character indexes in the index set to determine the target vocabulary to be matched, wherein the index set corresponding to the target vocabulary to be matched comprises the character indexes matched with the target text in the target text set.
The foregoing description is only exemplary of the preferred embodiments of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. A method for processing information, comprising:
acquiring a search word input by a user and a preset vocabulary set to be matched, wherein an index set corresponding to the vocabulary to be matched is preset for the vocabulary to be matched in the vocabulary set to be matched, and the index set comprises a character index;
performing word segmentation processing on the acquired search words to acquire a target text set;
matching the target text set and the index set based on word indexes in the index set to determine a target vocabulary to be matched, wherein the index set corresponding to the target vocabulary to be matched comprises word indexes matched with the target text in the target text set;
for the vocabulary to be matched in the vocabulary set to be matched, a search result corresponding to the vocabulary to be matched is preset; and
the method further comprises the following steps:
and determining the search result corresponding to the target vocabulary to be matched as a target search result and outputting the target search result.
2. The method of claim 1, wherein target text in the set of target text is a target word; and
matching the target text set and the index set based on the word indexes in the index set, wherein the matching comprises the following steps:
And matching the target words in the target text set with the word indexes in the index set.
3. The method of claim 1, wherein the index set further comprises a vocabulary index, target texts in the target text set being target vocabularies; and
matching the target words in the target text set with the word indexes in the index set;
in response to the fact that the target text set comprises the target words which are not successfully matched, word segmentation processing is carried out on the target words which are not successfully matched, and target characters are obtained;
and matching the obtained target words with word indexes in an index set corresponding to the target words successfully matched in the target text set.
4. The method as claimed in claim 3, wherein the index set corresponding to the vocabulary to be matched is obtained by the following steps:
performing word segmentation on the vocabulary to be matched to obtain a processing result comprising the vocabulary;
for the vocabulary in the obtained processing result, obtaining a candidate vocabulary of the vocabulary, wherein the candidate vocabulary comprises but is not limited to at least one of the following items: synonyms, near synonyms;
and generating an index set corresponding to the vocabulary to be matched based on the processing result corresponding to the vocabulary to be matched and the candidate vocabulary.
5. The method according to one of claims 1 to 4, wherein the matching the target text set and the index set corresponding to the vocabulary to be matched in the vocabulary set to be matched based on the word index in the index set to determine the target vocabulary to be matched comprises:
selecting a target text from the target text set as a target text to be matched, and executing the following determination steps based on the target text to be matched and the vocabulary set to be matched: matching the target text to be matched with the indexes in the index set corresponding to the words to be matched in the word set to be matched so as to determine a target index; determining whether the target text set comprises unselected target texts; in response to the fact that the target text set does not comprise unselected target texts, determining words to be matched corresponding to the index set comprising the target indexes as target words to be matched;
and in response to the fact that the target text set comprises unselected target texts, selecting the target texts from the unselected target texts as target texts to be matched, generating a new vocabulary set to be matched based on vocabularies to be matched corresponding to the index set comprising the target indexes, and continuing to execute the determining step based on the target texts to be matched which are selected last time and the vocabulary set to be matched which is generated last time.
6. An apparatus for processing information, comprising:
the device comprises an acquisition unit, a matching unit and a matching unit, wherein the acquisition unit is configured to acquire a search word input by a user and a preset vocabulary set to be matched, and for the vocabulary to be matched in the vocabulary set to be matched, an index set corresponding to the vocabulary to be matched is preset and comprises a character index;
the word cutting unit is configured to perform word cutting processing on the acquired search words to obtain a target text set;
the matching unit is configured to match the target text set and the index set based on word indexes in an index set to determine a target vocabulary to be matched, wherein the index set corresponding to the target vocabulary to be matched comprises the word indexes matched with the target text in the target text set;
for the vocabulary to be matched in the vocabulary set to be matched, a search result corresponding to the vocabulary to be matched is preset; and
the device further comprises:
and the determining unit is configured to determine the search result corresponding to the target vocabulary to be matched as a target search result and output the target search result.
7. The apparatus of claim 6, wherein target text in the set of target text is a target word; and
The matching unit is further configured to:
and matching the target words in the target text set with the word indexes in the index set.
8. The apparatus of claim 6, wherein the set of indices further includes a vocabulary index, the target text in the set of target texts being target vocabulary; and
the matching unit includes:
a vocabulary matching module configured to match target vocabularies in the target text set with vocabulary indexes in the index set;
the word segmentation processing module is configured to respond to the fact that the target text set comprises the target words which are not successfully matched, and conduct word segmentation processing on the target words which are not successfully matched to obtain target characters;
and the character matching module is configured to match the obtained target characters with character indexes in an index set corresponding to the target words successfully matched in the target text set.
9. The apparatus of claim 8, wherein the index set corresponding to the vocabulary to be matched is obtained by:
performing word segmentation on the vocabulary to be matched to obtain a processing result comprising the vocabulary;
for the vocabulary in the obtained processing result, obtaining a candidate vocabulary of the vocabulary, wherein the candidate vocabulary comprises but is not limited to at least one of the following items: synonyms, near synonyms;
And generating an index set corresponding to the vocabulary to be matched based on the processing result corresponding to the vocabulary to be matched and the candidate vocabulary.
10. The apparatus according to one of claims 6-9, wherein the matching unit further comprises:
a first execution module configured to select a target text from the target text set as a target text to be matched, and based on the target text to be matched and the vocabulary set to be matched, execute the following determination steps: matching the target text to be matched with the indexes in the index set corresponding to the words to be matched in the word set to be matched so as to determine a target index; determining whether the target text set comprises unselected target texts; in response to the fact that the target text set does not comprise unselected target texts, determining words to be matched corresponding to the index set comprising the target indexes as target words to be matched;
and the second execution module is configured to respond to the fact that the target text set comprises unselected target texts, select the target texts from the unselected target texts as target texts to be matched, generate a new vocabulary set to be matched based on vocabularies to be matched corresponding to the index set comprising the target indexes, and continuously execute the determination step based on the target texts to be matched which are selected last time and the vocabulary set to be matched which is generated last time.
11. A server, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-5.
12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201810585420.9A 2018-06-08 2018-06-08 Method and apparatus for processing information Active CN110580276B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810585420.9A CN110580276B (en) 2018-06-08 2018-06-08 Method and apparatus for processing information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810585420.9A CN110580276B (en) 2018-06-08 2018-06-08 Method and apparatus for processing information

Publications (2)

Publication Number Publication Date
CN110580276A CN110580276A (en) 2019-12-17
CN110580276B true CN110580276B (en) 2022-06-28

Family

ID=68808961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810585420.9A Active CN110580276B (en) 2018-06-08 2018-06-08 Method and apparatus for processing information

Country Status (1)

Country Link
CN (1) CN110580276B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158971A (en) * 2007-11-15 2008-04-09 深圳市迅雷网络技术有限公司 Search result ordering method and device based on search engine
KR20100072997A (en) * 2008-12-22 2010-07-01 한국전자통신연구원 System for string matching based on tokenization and method thereof
CN101819578A (en) * 2010-01-25 2010-09-01 青岛普加智能信息有限公司 Retrieval method, method and device for establishing index and retrieval system
CN103678560A (en) * 2013-12-06 2014-03-26 乐视网信息技术(北京)股份有限公司 Multimedia resource error correction searching method and system and multimedia resource server
CN103886034A (en) * 2014-03-05 2014-06-25 北京百度网讯科技有限公司 Method and equipment for building indexes and matching inquiry input information of user
CN105138511A (en) * 2015-08-10 2015-12-09 北京思特奇信息技术股份有限公司 Method and system for semantically analyzing search keyword
CN105183733A (en) * 2014-06-05 2015-12-23 阿里巴巴集团控股有限公司 Methods for matching text information and pushing business object, and devices for matching text information and pushing business object
CN105760399A (en) * 2014-12-19 2016-07-13 华为软件技术有限公司 Data retrieval method and device
CN106815195A (en) * 2015-11-27 2017-06-09 方正国际软件(北京)有限公司 A kind of segmenting method and device, search method and device
CN106844370A (en) * 2015-12-03 2017-06-13 小米科技有限责任公司 Set up information index, the method and device of search information

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158971A (en) * 2007-11-15 2008-04-09 深圳市迅雷网络技术有限公司 Search result ordering method and device based on search engine
KR20100072997A (en) * 2008-12-22 2010-07-01 한국전자통신연구원 System for string matching based on tokenization and method thereof
CN101819578A (en) * 2010-01-25 2010-09-01 青岛普加智能信息有限公司 Retrieval method, method and device for establishing index and retrieval system
CN103678560A (en) * 2013-12-06 2014-03-26 乐视网信息技术(北京)股份有限公司 Multimedia resource error correction searching method and system and multimedia resource server
CN103886034A (en) * 2014-03-05 2014-06-25 北京百度网讯科技有限公司 Method and equipment for building indexes and matching inquiry input information of user
CN105183733A (en) * 2014-06-05 2015-12-23 阿里巴巴集团控股有限公司 Methods for matching text information and pushing business object, and devices for matching text information and pushing business object
CN105760399A (en) * 2014-12-19 2016-07-13 华为软件技术有限公司 Data retrieval method and device
CN105138511A (en) * 2015-08-10 2015-12-09 北京思特奇信息技术股份有限公司 Method and system for semantically analyzing search keyword
CN106815195A (en) * 2015-11-27 2017-06-09 方正国际软件(北京)有限公司 A kind of segmenting method and device, search method and device
CN106844370A (en) * 2015-12-03 2017-06-13 小米科技有限责任公司 Set up information index, the method and device of search information

Also Published As

Publication number Publication date
CN110580276A (en) 2019-12-17

Similar Documents

Publication Publication Date Title
US10795939B2 (en) Query method and apparatus
CN107066449B (en) Information pushing method and device
CN108052613B (en) Method and device for generating page
CN107241260B (en) News pushing method and device based on artificial intelligence
CN109543058B (en) Method, electronic device, and computer-readable medium for detecting image
CN107301170B (en) Method and device for segmenting sentences based on artificial intelligence
CN111428010B (en) Man-machine intelligent question-answering method and device
US11758088B2 (en) Method and apparatus for aligning paragraph and video
CN111104482A (en) Data processing method and device
CN113470619B (en) Speech recognition method, device, medium and equipment
CN109858045B (en) Machine translation method and device
CN107203504B (en) Character string replacing method and device
CN109190123B (en) Method and apparatus for outputting information
CN113657113A (en) Text processing method and device and electronic equipment
CN112988753B (en) Data searching method and device
CN110992938A (en) Voice data processing method and device, electronic equipment and computer readable medium
CN110738056B (en) Method and device for generating information
CN110232920B (en) Voice processing method and device
CN110634050B (en) Method, device, electronic equipment and storage medium for identifying house source type
CN110245334B (en) Method and device for outputting information
CN110059172B (en) Method and device for recommending answers based on natural language understanding
CN112182255A (en) Method and apparatus for storing media files and for retrieving media files
CN112148841B (en) Object classification and classification model construction method and device
CN106896936B (en) Vocabulary pushing method and device
CN113590756A (en) Information sequence generation method and device, terminal equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant