WO2008022581A1 - Procédé et dispositif d'obtention de mots nouveaux et système et procédé de saisie - Google Patents

Procédé et dispositif d'obtention de mots nouveaux et système et procédé de saisie Download PDF

Info

Publication number
WO2008022581A1
WO2008022581A1 PCT/CN2007/070419 CN2007070419W WO2008022581A1 WO 2008022581 A1 WO2008022581 A1 WO 2008022581A1 CN 2007070419 W CN2007070419 W CN 2007070419W WO 2008022581 A1 WO2008022581 A1 WO 2008022581A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
user
words
new
frequency
Prior art date
Application number
PCT/CN2007/070419
Other languages
English (en)
Chinese (zh)
Inventor
Qi Guo
Zijian Tong
Lei Yang
Original Assignee
Beijing Sogou Technology Development Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=37817498&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=WO2008022581(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Beijing Sogou Technology Development Co., Ltd. filed Critical Beijing Sogou Technology Development Co., Ltd.
Publication of WO2008022581A1 publication Critical patent/WO2008022581A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Definitions

  • the present invention relates to the field of Internet information processing, and in particular, to a method for acquiring a new word, a new word acquisition system, a new word acquisition device, and an input method system.
  • words are the most basic analysis elements in many language processing technologies, it is necessary to obtain new and emerging words in a timely and effective manner to ensure the accuracy of language processing technology. For example, vocabularies with different attributes are the basis for natural language understanding, machine translation, automatic writing of abstracts, and so on.
  • words are always used as search units to reduce the redundancy of search results.
  • speech recognition words are usually used as the lowest level of linguistic information, and language models are built based on words to solve the acoustic uncertainty at the word level.
  • the prior art generally employs a method of manually collecting new words and adding them to an existing vocabulary.
  • new words are manually collected by the administrator of the search site and then added to the custom vocabulary used by the site; or manually collected by the lexicon developer and then included in the system dictionary used by the next generation (usually For use in fields such as input methods; or set up a common vocabulary (for example, Violet), and then manually collect new words by netizens or other publics, and join the public vocabulary to gather a lot of artificial power.
  • these methods are very time consuming, labor intensive, labor intensive, and inefficient. Therefore, there is an urgent need for a method that can effectively acquire new words in a timely and efficient manner from the use of complex languages. Summary of the invention
  • the technical problem to be solved by the present invention is to provide a method and system for acquiring new words, which can acquire some new words frequently used by users in a simple, convenient, timely and effective manner; and can effectively remove interfering words and provide relatively accurate new word output. .
  • Another object of the present invention is to provide an input method system that can automatically and automatically acquire the personalized words of the user in a timely, convenient and effective manner, and acquire new words by collecting the individual words of the plurality of users.
  • Another object of the present invention is to provide a new word acquisition apparatus which can provide a relatively accurate new word output with high efficiency.
  • Another object of the present invention is to provide a vocabulary generating method and a vocabulary generating apparatus which can provide a relatively accurate vocabulary or a new vocabulary with high efficiency.
  • the present invention provides a method for acquiring a new word, comprising the steps of: acquiring a word selected by a user during a user input process; comparing a word selected by the user with an existing word, according to the comparison As a result, the user's personal words are obtained; the individual words of each user are collected; new words are obtained according to the personalized words.
  • the user word frequency is also recorded during the user input process, and the user word frequency is the frequency information of the user inputting the word.
  • the comparison may be: recording a word selected by the user to the user vocabulary, storing the existing word in the input system vocabulary, comparing the user lexicon with the input method system vocabulary; or directly comparing the user each time selected Words and existing words.
  • the following steps may be used to obtain the user's personalized words: determining whether the selected word of the user exists in the existing word; if not, determining that the word is the user's individual word.
  • the following steps can also be used to obtain the user's personalized words: determine whether the selected word of the user exists in the existing word; if not, further determine the corresponding word frequency of the word; if the word corresponding to the user word frequency If it is greater than or equal to the predetermined threshold, it is determined that the word is a personalized word.
  • the following steps can also be used to obtain the user's personal words: judge the user's selected words in the existing Whether there is a word; if it does not exist, it is determined that the word is a user's personality word; if it exists, the user's word frequency and system word frequency of the word are further compared, and the system word frequency is pre-prescribed in the input method system vocabulary
  • the word frequency information corresponding to the existing word is set; if the ratio of the user word frequency to the system word frequency is greater than or equal to the predetermined threshold, the word is determined to be a personalized word.
  • the user's personalized word can be obtained by the following steps: determining whether the selected word of the user exists in the existing word; if not, further determining the corresponding word frequency of the word; if the word corresponds If the user word frequency is greater than or equal to the predetermined threshold, the word is determined to be a personality word; if present, the user word frequency and the system word frequency of the word are further compared, and the system word frequency is preset in the input method system vocabulary The word frequency information corresponding to the existing word; if the ratio of the user word frequency to the system word frequency is greater than or equal to the predetermined threshold, the word is determined to be a personalized word.
  • the method for acquiring a new word further includes: counting a number of times the personalized word appears in a preset internet page database; if the number of occurrences of the personalized word is greater than or equal to a preset threshold, The word is output as a new word.
  • the preset Internet page database is obtained by the following steps: weighting the Internet page; storing the Internet page whose weight value is greater than or equal to the preset threshold to the Internet page database.
  • the collecting in the method for acquiring new words may be: the input method user computing device sends the user's personalized words to the word collecting computing device in real time or at a time.
  • the method for acquiring a new word further includes: generating a new thesaurus according to the outputted new words or adding the obtained new words to the original thesaurus, and obtaining a new thesaurus or a new version of the whole thesaurus.
  • the invention also discloses a method for acquiring a new word, comprising: acquiring a word selected by a user in a user input process; collecting selected words of each user; comparing the selected word of the user with the existing word, according to The result of the comparison obtains the user's personality word; the new word is obtained according to the personality word.
  • the invention also discloses a new word acquisition system based on an input method, comprising: a word extraction unit, which is connected with an input method system, and is used for acquiring a word selected by a user in a user input process; a word comparison unit, Connected to the word extraction unit, used to compare the selected word with the existing word, and obtain the user's individual word according to the comparison result; the collecting unit is used to collect the individual words of each user; the new word acquiring unit is used Obtain new words based on the personality words.
  • the invention also discloses another new word acquisition system based on the input method, comprising: a word extraction unit, which is connected with the input method system, and is used for acquiring the words selected by the user during the user input process; the collecting unit, a selected word for collecting each user; a word matching unit, connected to the collecting unit, for comparing the selected word with the existing word, and obtaining the user's individual word according to the comparison result; , for obtaining a new word according to the personality word.
  • a word extraction unit which is connected with the input method system, and is used for acquiring the words selected by the user during the user input process
  • the collecting unit a selected word for collecting each user
  • a word matching unit connected to the collecting unit, for comparing the selected word with the existing word, and obtaining the user's individual word according to the comparison result; , for obtaining a new word according to the personality word.
  • the invention also discloses an input method system, comprising an input interface unit, a display unit and a system vocabulary, further comprising: a word extraction unit, connected to the input method system, for acquiring a word selected by the user during the user input process
  • the word matching unit is connected to the word extracting unit for comparing the selected word with the existing word, and obtaining the user's individual word according to the comparison result.
  • the input interface unit, the display unit, and the system vocabulary of the input method system may be located in the same computing device; or the input interface unit and the display unit of the input method system are located in the first computing device, and the system vocabulary is located at the first In the second computing device, the input method system acquires corresponding information from the second computing device according to the information input by the user, and displays the corresponding character in the first computing device.
  • the input method system may further include: a communication unit, configured to send the personalized word.
  • the input method system may further include: a user vocabulary for storing words selected by the user.
  • the input method system may further include: a word frequency recording unit, connected to the input method system, configured to record a user word frequency during a user input process, wherein the user word frequency is a frequency information input by the user for the word.
  • the word comparison unit may include: a first comparison subunit, configured to determine whether a word selected by the user exists in an existing word; if yes, output the word to a third comparison a unit, if not present, outputting the word to the second comparison subunit; and a second comparison unit, configured to further determine the corresponding word when the selected word does not exist in the existing word User word frequency; if the corresponding user word frequency of the word is greater than or equal to the predetermined threshold, then the word is determined to be a personalized word; the third comparison sub-unit is used when the user selected word exists in the existing word Further comparing the user word frequency and the system word frequency of the word, the system word frequency is the word frequency information corresponding to the existing word preset in the input system system vocabulary; if the ratio of the user word frequency to the system word frequency is greater than or equal to the predetermined For a wide value, the word is determined to be a personal word.
  • the invention also discloses a new word obtaining device, comprising: a personalized word collecting unit, configured to collect individual words of each user; a statistical unit, configured to count the personalized words appearing in a preset internet page database The number of times; the new word determining unit is connected to the statistical unit for determining whether the number of occurrences of the personalized word is greater than or equal to a preset threshold, and if so, outputting the word as a new word.
  • the collecting is that the user computing device sends the user's personalized words to the personalized word collecting unit in real time or at a time.
  • the new word obtaining device further includes: a thesaurus generating unit, configured to generate a new thesaurus according to the outputted new words or add the obtained new words to the original thesaurus, and obtain a new thesaurus or a new version of the whole thesaurus .
  • a thesaurus generating unit configured to generate a new thesaurus according to the outputted new words or add the obtained new words to the original thesaurus, and obtain a new thesaurus or a new version of the whole thesaurus .
  • the new word obtaining device further includes: an internet page database generating unit, configured to perform weighting assignment on the internet page; and store an internet page whose weight value is greater than or equal to a preset threshold to the internet page database.
  • the invention also discloses a new word obtaining device, comprising: a word collecting unit for collecting selected words of each user; a word matching unit, connected with the word collecting unit, for comparing the selected words of the user a word and an existing word, obtaining a user personality word according to the comparison result; a new word obtaining unit, configured to acquire a new word according to the personality word.
  • the new word obtaining unit includes: a statistical subunit, configured to count the number of times the personalized word appears in the preset internet page database; the new word determining subunit is connected to the statistical subunit for determining Whether the number of occurrences of the personalized word is greater than or equal to a preset threshold, and if so, the word is output as a new word.
  • the word collecting unit is further configured to collect a user word frequency corresponding to the word selected by the user;
  • the new word acquiring device further includes: a statistical subunit, configured to collect the personalized word in the preset internet page database The number of occurrences in the Internet, the Internet word frequency is obtained;
  • the weight word frequency determining sub-unit is configured to perform weight correction on the user word frequency and the Internet word frequency of the new word, and obtain the weight word frequency of the new word; the new word determining sub-unit, And determining whether the weight word frequency of the personalized word is greater than or equal to a preset threshold, and if so, outputting the word as a new word.
  • the invention also discloses a vocabulary generating method, comprising: collecting input behavior information of each user, the input behavior information including a selected word in a user input process and a corresponding user word frequency of the word; Each user's word frequency is weighted, and the user's cumulative word frequency of each word is calculated; a vocabulary is generated, and the vocabulary includes the words and their corresponding user cumulative word frequencies.
  • the vocabulary generating method further includes: removing a word whose user cumulative word frequency is less than or equal to a certain threshold.
  • the method for generating a thesaurus further includes: comparing the generated thesaurus with the existing thesaurus, and removing the words that do not conform to the preset rules according to the comparison result, and outputting the user's personalized words; according to the user's personalized words Words generate a personal dictionary of words.
  • the method for generating the thesaurus further comprising: comparing the generated thesaurus with the existing thesaurus, and removing the words that do not conform to the preset rules according to the comparison result, and outputting the user's personalized words; The number of times the word appears in the preset Internet page database, the Internet word frequency is obtained; the user cumulative word frequency of the personalized word and the Internet word frequency are weighted and summed to obtain the weight word frequency of the new word; If the weight of the word is greater than or equal to the preset threshold, the word is output as a new word; a new vocabulary is generated according to the outputted new word, and the new vocabulary includes the new word and its corresponding weight word frequency.
  • the invention also discloses a vocabulary generating device, comprising: a collecting unit, configured to collect input behavior information of each user, wherein the input behavior information includes a selected word in a user input process and a corresponding word frequency of the word a word frequency calculation unit, configured to perform weight correction on each user word frequency corresponding to the word, and calculate a cumulative word frequency of each word; a thesaurus generating unit, configured to generate a thesaurus, the word library including the words and their corresponding accumulations Word frequency.
  • the thesaurus generating device further includes: a personalized word determining unit, configured to compare the generated thesaurus with the existing thesaurus, and remove the words that do not conform to the preset rules according to the comparison result, and output the user personalized words.
  • the thesaurus generating device further includes: a personalized word determining unit, configured to compare the generated thesaurus with the existing thesaurus, and remove the words that do not conform to the preset rules according to the comparison result, and output the user personalized words.
  • a statistical unit configured to count the number of occurrences of the personalized word in the preset Internet page database, to obtain an Internet word frequency
  • a weight word frequency determining unit configured to perform a cumulative word frequency and an Internet word frequency for the user of the personalized word The weight is corrected and summed to obtain the weight of the word
  • the new word determining unit outputs the word as a new word if the weight of the personalized word is greater than or equal to the preset threshold
  • the generating unit generates a new vocabulary according to the outputted new word, and the new vocabulary includes the new word and its corresponding weight word frequency.
  • the present invention has the following advantages:
  • the present invention proposes a distributed architecture, including multiple users and a collection end, and collects new words with universal meanings from individual user words by collecting user input behavior information of multiple users;
  • the new words in the Internet information or corpus are also generated by the usage behavior of each user, so the present invention provides a solution from the perspective of user input, thereby being simple and convenient. Get more accurate, universal words.
  • the present invention further collects the user word frequency information in the user input behavior, thereby removing some interference vocabulary, such as user input errors, etc.; and also finding some new words with sociological significance, for example, obtaining some words through the user word frequency.
  • the invention can further input the collected user input behavior information into a selected internet page database, perform statistics on the number of occurrences thereof, and remove the vocabulary with lower frequency, thereby obtaining more accurate new words, that is, finding true It is a new word in the linguistic sense, and removes vocabulary or wrong vocabulary that does not have universal meaning.
  • the invention can also arrange the obtained new words into a new vocabulary or a new version of the whole vocabulary, thereby providing the input method, which can improve the hit rate and input speed of the preferred words when the user inputs, and can improve the reasonable sorting of the candidate words.
  • Sex convenient for users to input new words faster and more accurately, without the cumbersome candidate selection process, you can get the words you want to input in the first or first page candidate.
  • the new thesaurus or the new version of the full thesaurus can also be provided to the search engine. When the user's query keyword string includes new words, the accuracy and coverage of the search results can be improved.
  • Embodiment 1 is a flow chart showing the steps of Embodiment 1 of the present invention.
  • Figure 2 is a flow chart showing the steps of Embodiment 2 of the present invention.
  • Figure 3 is a flow chart showing the steps of Embodiment 3 of the present invention.
  • Figure 4 is a flow chart showing the steps of obtaining a new word from the collected user personality words
  • Figure 5 is a block diagram showing the structure of an embodiment of an input method system of the present invention.
  • FIG. 6 is a structural block diagram of a new word acquisition apparatus of the present invention.
  • FIG. 7 is a structural block diagram of another new word acquisition apparatus of the present invention.
  • FIG. 8 is a flow chart showing the steps of a method for generating a thesaurus according to the present invention.
  • Embodiment 1 is a flow chart of the steps of Embodiment 1 of the present invention, including the following steps:
  • Step 101 Obtain a word selected by the user during the user input process.
  • Step 101 is a word selected by the user who records the user's input behavior information.
  • the encoded character string may be a pinyin code or a font code, that is, the present invention can be applied to various input methods.
  • Some words of the user's personality will be included in the words selected by the user.
  • the user needs to input words such as "broad value”, "nine ceremonies” or a certain name, but the original vocabulary of the input method does not Such words cannot be directly displayed to the user in the candidate words, and the user needs to select each word to obtain the desired personalized vocabulary.
  • the user can also create new words and new words in the original thesaurus that are not needed by the user through the artificial word-making function provided by the input method, so that the user can select the desired personality word during the input process. word.
  • the present invention is capable of selecting the personal words of the user from the words selected by the user.
  • Step 102 Compare the selected word with the existing word, and obtain the user's personalized word according to the comparison result.
  • the comparison may be performed once each time the user determines the selected word, and compares the selected word with the existing word. If it is within the preset judgment rule, it is determined to be the user's individual word and recorded, Recording the user's personality words into the system vocabulary or recording to the user's personality vocabulary;
  • the words selected by the user in 101 can be recorded only in the form of a cache.
  • step 101 records the user-selected words to the user lexicon first, and the input method system vocabulary is used to store the existing words
  • the comparison in step 102 can also compare the user lexicon with the input method system words at regular intervals.
  • the library records the determined user personality words into the user's personality dictionary or marks them in the user's vocabulary. This method can reduce the amount of data calculation during the user input process, thereby avoiding the extraction of the user input behavior and affecting the user's input behavior itself.
  • the preset rule for determining the user's personalized words can be set by a person skilled in the art as needed. Yes.
  • the user's personalized word is obtained by the following steps: determining whether the word selected by the user exists in the existing word; if not, determining that the word is the user's individual word.
  • Step 103 Collect individual words of each user.
  • the collecting may be: the input method user computing device sends the user's personalized words to the word collection computing device in real time or periodically, that is, the input method computing device has an automatically transmitted module.
  • the collection computing device exists in the form of a server.
  • the collecting may also send the personalized words to the collecting end periodically or irregularly for the input method user, that is, the sending is manually initiated by the user, for example, each user sends his own personalized words to a unified email address. Or implement collection in a unified server.
  • the vocabulary storing the user's personalized words may be sent to the collecting computing device in real time or periodically, for example, each user passes the timing or Unscheduled collection of the thesaurus on the server can be achieved.
  • the collection of user personality words is simpler, because the input method system used by the user at this time It is a server itself, which can be used by multiple users. It can collect the input behavior information of each user during use.
  • the present invention is feasible in any way that enables information collection, and is no longer - an illustration.
  • Step 104 Obtain a new word according to the personality word.
  • This step gets new words by removing duplicate words from all collected user personal words. This step can also use new filtering, simplified ways to get new words.
  • the present invention can obtain new words from the collected user personal words by: counting the number of occurrences of the personalized words in the preset Internet page database; if the number of occurrences of the personalized words is greater than or equal to If the preset threshold is used, the word is output as a new word.
  • Embodiment 2 of the present invention is a flow chart of the steps of Embodiment 2 of the present invention, which includes the following steps:
  • Step 201 Obtain, in a user input process, a word selected by a user
  • Step 202 Collect selected words of each user; (Non-read only), step 202 collects the user vocabulary of each user or the word selected by the user in the system vocabulary.
  • the collection manners may be in various manners as described above, and are not described herein again.
  • Step 203 Compare the selected word and the existing word by the user, and obtain the user's personalized word according to the comparison result;
  • Step 204 Obtain a new word according to the personality word.
  • the second embodiment is basically similar to the concept of the first embodiment.
  • the main difference is that the selected words of a plurality of users are collected first, and then the comparison is performed uniformly, and the user's individual words are obtained according to the comparison result; the method can reduce the comparison calculation. The number of times, and can reduce the burden of the local input method system, but because a large number of user-selected words are compared, the comparison will increase the system load of the server.
  • the person skilled in the art can select and use according to the needs.
  • Embodiment 3 it is a flowchart of the steps of Embodiment 3 of the present invention. Further optimization of Embodiment 3 based on Embodiment 1 includes the following steps:
  • Step 301 During the user input process, record the words selected by the user and the frequency of the user words to the user vocabulary;
  • a user vocabulary is created on the user side for recording the words selected by the user and the frequency of the user words, and the frequency of the user words is frequency information of the user inputting the words. This step can completely record the user's input behavior, regardless of whether the word is a new word.
  • the input system vocabulary can be set to the modifiable mode, and the user-selected words and their user words can be directly recorded to the system vocabulary.
  • Step 302 Compare the user vocabulary and the system vocabulary, and obtain the user's personalized words according to the comparison result; the following methods.
  • the first type determines whether the word selected by the user exists in the existing word; if not, determines that the word is a user's individual word.
  • the second type determining whether the word selected by the user exists in the existing word; if not, further determining the corresponding word frequency of the word; if the corresponding word frequency of the word is greater than or equal to the predetermined threshold, then determining This word is a personal word. If it exists, it can be determined as a non-personal word.
  • the third type determining whether the word selected by the user exists in the existing word; if not, determining that the word is a user's individual word; if present, further comparing the user's word frequency and system word of the word Frequency, the word frequency of the system is word frequency information corresponding to an existing word preset in the input method system vocabulary; if the ratio of the user word frequency to the system word frequency is greater than or equal to a predetermined threshold, determining the word as a personalized word .
  • the user word frequency is used to further judge the individual words, and some words that are not commonly used, but are very commonly used nowadays, that is, new words whose application scope or application environment has changed, can be obtained.
  • the ratio parameter used in the above method is a preferred example, and of course, other feasible parameters can also be used for evaluation.
  • the fourth type determines whether the word selected by the user exists in the existing word; if not, further determines the corresponding word frequency of the word; if the corresponding word frequency of the word is greater than or equal to the predetermined threshold, then determining The word is a personalized word; if present, the user's word frequency and the system word frequency of the word are further compared, and the system word frequency is the word frequency information corresponding to the existing word preset in the input system system vocabulary; If the ratio of the word frequency to the system word frequency is greater than or equal to the predetermined threshold, then the word is determined to be a personalized word.
  • This mode is a preferred example of the present invention, and a more accurate user personality word can be obtained.
  • Step 303 Collect individual words of each user.
  • Step 304 Obtain a new word according to the personality word.
  • This step gets new words by removing duplicate words from all collected user personal words. This step can also use new filtering, simplified ways to get new words. This will be detailed later in Figure 4.
  • Step 305 Generate a new thesaurus according to the outputted new words or add the obtained new words to the original thesaurus to obtain a new thesaurus or a new version of the full thesaurus.
  • This step is used to organize the new words obtained in step 304 into a vocabulary, which can be used in the input method system or the search field.
  • the stored new vocabulary or the new version of the second vocabulary second computing device may exist in the network in the form of a server, and provide a vocabulary update service to any other client program that needs to input the new vocabulary information.
  • a vocabulary update service to any other client program that needs to input the new vocabulary information.
  • it does not need to be in the form of a fixed server, or it can exist in a local computing device, and any required input method to other terminals through P2P (peer-to-peer) technology.
  • the client program of the new word information provides a thesaurus update service.
  • the updating may be performed by: updating the system vocabulary at the same time when the input method system is updated; or performing online update of the system vocabulary by means of the server actively pushing; or, by the user The request is initiated, and the server returns data according to the request to update the system vocabulary.
  • the server returns data according to the request to update the system vocabulary.
  • various data update methods may be used, and the present invention is not limited thereto, and those skilled in the art may select them according to needs.
  • setting a unit for receiving user input information and displaying corresponding characters in the input method system is located in the first computing device; the obtained new thesaurus or the new version of the full thesaurus is the input method system a system vocabulary, the system vocabulary is located in a second computing device; the input method system obtains corresponding information from a system vocabulary located in the second computing device according to information input by the user, and displays corresponding characters in the first computing device , complete the text input.
  • the new thesaurus or the new version of the whole thesaurus obtained according to the new word extraction method of the present invention can be directly used as the system vocabulary of the input method system, and the online thesaurus can be used without updating operations.
  • the input method system is divided into two parts, the receiving and displaying unit is located in the first computing device, and the thesaurus information is located in the second computing device, which can perfectly implement the online application of the input method; of course, the encoding required for the input method system
  • the matching process can be arbitrarily set in a computing device as needed.
  • the present invention is also applicable to the field of search.
  • the user can accurately segment the query keyword string according to the thesaurus obtained by the method for extracting new words by the present invention. Then, based on the result of the word segmentation, the accuracy and coverage of the search results can be improved.
  • the present invention can obtain new words from collected user personal words by the following steps:
  • Step 401 Remove duplicate user personality words
  • Step 402 Perform weight assignment on the Internet page; store the Internet page whose weight value is greater than or equal to the preset threshold to the Internet page database, thereby obtaining a preset Internet page database;
  • Step 403 Statistics the personalized word in the preset The number of occurrences in the internet page database; if the number of occurrences of the personalized word is greater than or equal to the preset threshold, the word is lost as a new word Out.
  • Step 402 is an optional step, and the purpose is to obtain a selected internet page database, so as to ensure the accuracy of the new word screening.
  • other methods can be used to form a pre-built Internet page database.
  • the step 402 of weight assignment it is a relatively important situation to assign a corresponding weight value according to the time formed by the web page and the type of the web page. Because for word frequency statistics, the impact of web page time is very important, so the impact of web page time on the weight value is greater. The farther the time point from the word frequency statistics is, the lower the weight value is. If the time difference is greater than certain. The value can give the page a lower weight value, even excluded from the word frequency statistics. Secondly, the type of webpage has a great influence on the word frequency statistics.
  • the webpage type generally refers to a portal website, a forum or some other determined webpages. The weight value of these webpages is higher because there are more participants and information in these webpages.
  • a rule base can be set, and the URL addresses of some webpages are stored in the library, so that the webpages of these URLs are more important for word frequency statistics, and the words appearing on these webpages are preferred.
  • the web page is given a greater weight value.
  • the present invention can further remove some duplicate web pages, yellow web pages and spam web pages by giving lower weight values, thereby further ensuring the accuracy of new word verification.
  • the vocabulary that needs to be counted is as much as possible of the user's "redundant information of the input page, etc., and the page redundancy information is generally invalid information; if not removed, new information will be added.
  • the amount of calculation of word extraction, and the frequency of words resulting in statistics are not objective, and the results are not accurate.
  • the present invention also proposes two new word acquisition systems based on the input method. Since the system is used to complete the foregoing method, only a brief introduction will be made below. For details, refer to the related parts.
  • a new word acquisition system based on input method including:
  • a word extraction unit connected to the input method system, for acquiring a word selected by the user in the user input process; a word comparison unit, connected to the word extraction unit, for comparing the selected word with the existing a word, obtaining a user's personality word according to the comparison result; a collecting unit, configured to collect individual words of each user; and a new word obtaining unit, configured to acquire a new word according to the personality word.
  • a new word acquisition system based on input method, including:
  • a word extraction unit connected to the input method system, for acquiring a word selected by the user during the user input process; a collecting unit for collecting the selected words of each user; a word matching unit, connected to the collecting unit And a method for comparing a user-selected word with an existing word, and acquiring a user's personalized word according to the comparison result; and a new word obtaining unit, configured to acquire a new word according to the personalized word.
  • the present invention also claims an input method system, including an input interface unit 501, a display unit 502, and a system vocabulary 503, and further includes:
  • a word extraction unit 504 connected to the input method system, for acquiring a word selected by the user during the user input process;
  • the word matching unit 505 is connected to the word extracting unit 504 for comparing the selected word with the existing word and obtaining the user's individual word according to the comparison result.
  • the user personality words may be stored in the user vocabulary 506 or may be stored in the system vocabulary 503 for marking; or may be stored in a special vocabulary.
  • the input method system can be used to extract the user's personality words in addition to the ordinary word input.
  • the input method system may be a common input method system.
  • the input interface unit, the display unit, and the system vocabulary of the input method system are located in the same computing device, and the input method system matches the local query according to the coding information input by the user.
  • the corresponding characters are displayed locally.
  • the input method system may also be a network input method system.
  • the input interface unit and the display unit of the input method system are located in the first computing device, and the system vocabulary is located in the second computing device, and the input method system is based on the user.
  • the input information is obtained from the second computing device to obtain corresponding information, and the corresponding character is displayed on the first computing device.
  • the input method system may further include: a user vocabulary 506 for storing a word selected by the user; and a communication unit 507, configured to send the personalized word.
  • a user vocabulary 506 for storing a word selected by the user
  • a communication unit 507 configured to send the personalized word.
  • Each user's input method system can send the user's personality words to a unified collection computing device, so as to collect a large amount of user input behavior information, and then analyze new words that meet the needs of the public and conform to the linguistic meaning.
  • the input method system may further include:
  • the word frequency recording unit 508 is connected to the input method system for recording a user word frequency during the user input process, and the user word frequency is frequency information of the user inputting the word.
  • the communication unit 507 can also be used to send user word frequency information related to personal words.
  • the word comparison unit 505 may further include:
  • a first comparison subunit 5051 configured to determine whether a word selected by the user exists in an existing word; if yes, output the word to a third comparison subunit, if not, output the word To the second comparison subunit;
  • the second comparison sub-unit 5052 is configured to further determine a user word frequency corresponding to the word when the selected word does not exist in the existing word; if the corresponding word frequency of the word is greater than or equal to a predetermined threshold , then determine that the word is a personal word.
  • the third comparison sub-unit 5053 is configured to further compare the user word frequency and the system word frequency of the word when the word selected by the user exists in the existing word, and the system word frequency is pre-prescribed in the input method system vocabulary
  • the word frequency information corresponding to the existing word is set; if the ratio of the user word frequency to the system word frequency is greater than or equal to the predetermined threshold, the word is determined to be a personalized word.
  • the above-mentioned word matching unit 505 is a preferred embodiment of the present invention. Of course, other matching rules may also be used, and the word matching unit 505 may include other sub-units, and the present invention No - an example.
  • the input interface unit 501 in the above input method system is most important for providing the user with information input and word selection; and can also be used for switching various modes, for example: input language switching (such as: Simplified and Traditional Chinese, Chinese and English switching), input mode switching (such as: single-word input, word input, sentence input switching), input state switching (such as: text, punctuation, special symbol switching) and so on.
  • Display unit 502 and system vocabulary 503 are well known to those skilled in the art and will not be described in detail herein.
  • the present invention also provides a new word acquiring apparatus, including:
  • the personalized word collecting unit 601 is configured to collect the personalized words of each user; the personalized words of the user may be obtained by the input method, and automatically sent to the personalized word collecting unit; or may be set or organized by the user, and sent To the individual word collection unit; or each user sets their personality word vocabulary to a fixed network space, and the personalized word collection unit obtains the individual words of each user from the network space. That is, the user's personal words in this embodiment are not necessarily obtained through user input behavior, but may also be set or organized by the user.
  • a statistical unit 602 configured to calculate that the personalized word appears in a preset Internet page database Number of times;
  • the new word determining unit 603 is connected to the statistical unit 602, and is configured to determine whether the number of occurrences of the personalized word is greater than or equal to a preset threshold, and if so, output the word as a new word.
  • the new word acquisition device can obtain a relatively accurate new word output according to the collected personal words of each user by using the verification in the Internet information.
  • the individual words of each user may be automatically obtained by the user's input behavior, or may be set or organized by the user.
  • the new word obtaining means may further include: a thesaurus generating unit 604, configured to generate a new thesaurus according to the outputted new words or add the obtained new words to the original thesaurus to obtain a new thesaurus or a new version of the whole thesaurus.
  • the new thesaurus or the new version of the full thesaurus can be used to update the input system system vocabulary or search for the word segmentation, thereby providing the user's input accuracy and the accuracy of the search results.
  • the new word obtaining apparatus may further include: an internet page database generating unit 605, configured to perform weighting on the internet page; and store an internet page whose weight value is greater than or equal to a preset threshold to the internet page database. .
  • the present invention also discloses another new word acquiring apparatus, including:
  • a word collecting unit 701 configured to collect selected words of each user
  • the word collecting unit 701 can be directly connected to an existing input method system to collect selected words of each user in real time, for example, a network input method.
  • the word collecting unit 701 can also extract the user-selected words transmitted in real time or periodically by each user's input method system, and the user-selected words are extracted by the user's input method system.
  • the word collecting unit 701 can also achieve the purpose of collecting a user selected word by receiving a user vocabulary or a system vocabulary sent by each user's input method system, wherein the user selected word is input by the user.
  • the method is extracted and stored in the user's thesaurus or system lexicon.
  • the word matching unit 702 is connected to the word collecting unit, and is configured to compare the selected word with the existing word, and obtain the user's personalized word according to the comparison result;
  • the new word obtaining unit 703 is configured to obtain a new word according to the personalized word.
  • the word comparison unit 702 can further include:
  • a first comparison sub-unit 7021 configured to determine whether a word selected by the user exists in an existing word; if yes, output the word to a third comparison sub-unit, if not, output the word To the second comparison subunit;
  • the second comparison unit 7022 is configured to further determine a user word frequency corresponding to the word when the selected word does not exist in the existing word; if the corresponding word frequency of the word is greater than or equal to a predetermined threshold, Then determine that the word is a personal word.
  • the third comparison sub-unit 7023 is configured to further compare the user word frequency and the system word frequency of the word when the word selected by the user exists in the existing word, and the system word frequency is pre-prescribed in the input method system vocabulary
  • the word frequency information corresponding to the existing word is set; if the ratio of the user word frequency to the system word frequency is greater than or equal to the predetermined threshold, the word is determined to be a personalized word.
  • the new word obtaining unit 703 may further include:
  • a statistical subunit 7031 configured to count the number of occurrences of the personalized word in a preset Internet page database, thereby obtaining an Internet word frequency of the word;
  • the new word determining subunit 7032 is connected to the statistical subunit for determining whether the internet word frequency is greater than or equal to a preset threshold, and if so, outputting the word as a new word.
  • the new word acquiring device may further include:
  • the thesaurus generating unit 704 is configured to generate a new thesaurus according to the outputted new words or add the obtained new words to the original thesaurus to obtain a new thesaurus or a new version of the full thesaurus.
  • the Internet page database generating unit 705 is configured to perform weighting on the Internet page; and store the Internet page whose weight value is greater than or equal to the preset threshold to the Internet page database.
  • the vocabulary generated by the vocabulary generating unit 704 can further include the user word frequency corresponding to the word.
  • the user word frequency and the internet word frequency may be weighted and superimposed and summed, and the user's personality word is given a weight word frequency. Then, filtering and removing according to the weight word frequency, for example, determining whether the weight word frequency of the personality word is greater than or equal to a preset threshold, and if so, outputting the word as a new word.
  • the invention also discloses a vocabulary generating method. Two embodiments of the vocabulary generating method are respectively described with reference to FIG. 8 , FIG. 8 a and FIG. 8 b , and the details are as follows:
  • the thesaurus generation method shown in Figure 8a includes the following steps:
  • Step 801a Collect input behavior information of each user, where the input behavior information includes a selected word in the user input process and a corresponding user word frequency of the word; the collection may be various manners mentioned in the foregoing.
  • Step 802a performing weight correction on the word frequency of each user corresponding to the word, and calculating a cumulative word frequency of the user of each word; the weight correction may be performed by analyzing the word frequency of each user corresponding to a certain word, for example, first The word frequency of each user corresponding to the word is analyzed to find the distribution trend, and the probability of occurrence of a word frequency value or the frequency value of the word frequency is corrected by the average range of the word range.
  • the user accumulated word frequency calculated after the above correction can remove some users' accidental behavior or malicious behavior, and obtain a more objective and accurate user cumulative word frequency, thereby ensuring the accuracy of the thesaurus.
  • Step 803a removing words whose user cumulative word frequency is less than or equal to a certain threshold. This step is a preferred step for further improving the ubiquity of words in the revenue lexicon.
  • Step 804a generating a thesaurus, the words database including words and their corresponding user cumulative word frequencies. Due to the large number of users of the input method, a universal vocabulary can be obtained by collecting the input behavior information of a large number of input method users.
  • the thesaurus can be directly provided to the input method system as a system vocabulary; it can also be imported as a user vocabulary by the user and used in conjunction with the system vocabulary.
  • the vocabulary generating method shown in FIG. 8a may further include the following steps: Step 805a: Comparing the generated lexicon with an existing vocabulary, and removing words that do not conform to the preset rule according to the comparison result, and outputting User-specific words; wherein the preset rules can be set by a person skilled in the art as needed, for example, in the foregoing step 302 of the present invention, four ways of obtaining user personality words according to the comparison result are obtained.
  • Step 806a Generate a personalized word dictionary according to the user personality word.
  • the thesaurus generation method shown in Figure 8b includes the following steps:
  • Step 801b Collect input behavior information of each user, where the input behavior information includes a selected word in the user input process and a corresponding word frequency of the word.
  • Step 802b performing weight correction on the word frequency of each user corresponding to the word, and calculating the cumulative word frequency of the user of each word.
  • Step 803b removing words whose user cumulative word frequency is less than or equal to a certain threshold.
  • Step 804b generating a vocabulary, the vocabulary including words and their corresponding user cumulative word frequency.
  • Step 805b Compare the generated thesaurus with the existing thesaurus, and remove the words that do not conform to the preset rules according to the comparison result, and output the user personalized words;
  • Step 806b counting the number of occurrences of the personalized words in the preset Internet page database, and obtaining an Internet word frequency
  • Step 807b performing weight correction on the cumulative word frequency of the user word and the Internet word frequency, and obtaining a weighted word frequency of the personalized word; if the weight word frequency of the personalized word is greater than or equal to a preset threshold, Output the word as a new word;
  • Step 808b Generate a new vocabulary according to the outputted new word, the new vocabulary including the new word and its corresponding weight word frequency.
  • the invention also discloses a thesaurus generating device, comprising the following components:
  • a collecting unit configured to collect input behavior information of each user, where the input behavior information includes a selected word in a user input process and a corresponding word frequency of the word;
  • a word frequency calculation unit configured to perform weight correction on each word frequency of each word corresponding to the word, and calculate a cumulative word frequency of each word;
  • the thesaurus generating unit is configured to generate a thesaurus, the thesaurus including the words and their corresponding cumulative word frequencies.
  • the vocabulary generating device may further include: a personalized word determining unit, configured to compare the generated vocabulary with an existing vocabulary, and remove a word that does not conform to the preset rule according to the comparison result, and output the user personality or
  • the the thesaurus generating device may further include:
  • a personalized word determining unit configured to compare the generated thesaurus with the existing thesaurus, and remove the words that do not conform to the preset rules according to the comparison result, and output the user personalized words;
  • a statistical unit configured to count the number of occurrences of the personalized words in a preset Internet page database, and obtain an Internet word frequency
  • a weight word frequency determining unit configured to perform weighting on the cumulative word frequency of the user of the personalized word and the Internet word frequency, and obtain a weighted word frequency of the word;
  • a new word determining unit if the weight word frequency of the personality word is greater than or equal to a preset threshold, the word is output as a new word
  • the thesaurus generating unit generates a new thesaurus according to the outputted new words, the new thesaurus including new words and their corresponding weight words. Since the present invention uses the word frequency statistics technology based on Internet information, and the user inputs the behavior information as the source of the new word, a large number of new words frequently used by each user can be conveniently and quickly obtained, and these new words are collectively filtered and continuously Provided to input method users, so that these users can track changes in Internet information at all times during use, and constantly input new words without having to go through a tedious process of selecting words each time a new word is entered, so that new Words can also become the user's preferred words, improve the preferred word hit rate when users input new words, and improve the rationality of candidate word sorting.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé d'obtention de mots nouveaux, consistant à obtenir les mots sélectionnés par un utilisateur lors de la saisie utilisateur; à comparer le mot sélectionné par l'utilisateur à des mots courants, et à obtenir les mots spécifiques à l'utilisateur en fonction du résultat de la comparaison; à recueillir les mots spécifiques à chaque utilisateur; et à obtenir les mots nouveaux en fonction des mots spécifiques. L'invention concerne également un procédé de production d'un dictionnaire, consistant à recueillir des informations d'action de saisie de chaque utilisateur ayant créé les mots sélectionnés lors de la saisie utilisateur, et la fréquence des mots utilisateur correspondant aux mots sélectionnés; à pondérer et à réparer chaque fréquence de mots utilisateur correspondant aux mots sélectionnés et à calculer la fréquence d'accumulation de mots utilisateur pour chaque mot; et à produire le dictionnaire contenant les mots et les fréquences de mots utilisateur accumulées correspondant aux mots. L'invention offre une architecture distribuée, analyse les mots nouveaux obtenus avec la signification commune de chaque mot spécifique à l'utilisateur, et offre une solution du point de vue de la saisie utilisateur, ladite invention permettant d'obtenir des mots nouveaux précis avec la signification commune de façon simple.
PCT/CN2007/070419 2006-08-09 2007-08-06 Procédé et dispositif d'obtention de mots nouveaux et système et procédé de saisie WO2008022581A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200610109732.X 2006-08-09
CN200610109732A CN1924858B (zh) 2006-08-09 2006-08-09 一种获取新词的方法、装置以及一种输入法系统

Publications (1)

Publication Number Publication Date
WO2008022581A1 true WO2008022581A1 (fr) 2008-02-28

Family

ID=37817498

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2007/070419 WO2008022581A1 (fr) 2006-08-09 2007-08-06 Procédé et dispositif d'obtention de mots nouveaux et système et procédé de saisie

Country Status (2)

Country Link
CN (1) CN1924858B (fr)
WO (1) WO2008022581A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109254972A (zh) * 2018-07-23 2019-01-22 努比亚技术有限公司 一种离线命令词库更新方法、终端及计算机可读存储介质
CN109472022A (zh) * 2018-10-15 2019-03-15 平安科技(深圳)有限公司 基于机器学习的新词识别方法及终端设备

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398834B (zh) * 2007-09-29 2010-08-11 北京搜狗科技发展有限公司 一种针对输入信息的处理方法和装置及一种输入法系统
CN101470732B (zh) * 2007-12-26 2012-04-18 北京搜狗科技发展有限公司 一种辅助词库的生成方法和装置
CN101290632B (zh) * 2008-05-30 2011-09-14 北京搜狗科技发展有限公司 一种用户词参与智能组词输入的方法及一种输入法系统
CN101533310A (zh) * 2009-04-02 2009-09-16 孙强国 一种拼音文字单词的输入和选择方法
CN102163198B (zh) * 2010-02-24 2014-10-22 北京搜狗科技发展有限公司 提供新词或热词的方法及系统
CN102193920B (zh) * 2010-03-04 2016-01-20 深圳市世纪光速信息技术有限公司 一种人名词库生成方法、装置及文字输入系统
CN102270048B (zh) * 2010-06-03 2016-04-20 北京搜狗科技发展有限公司 一种名词输入的方法及系统
CN102298581B (zh) * 2010-06-23 2015-11-25 深圳市腾讯计算机系统有限公司 一种输入法词库的处理方法和装置
CN102508554A (zh) * 2011-10-02 2012-06-20 上海量明科技发展有限公司 一种通信关联的输入方法、个性语库及系统
CN103324627A (zh) * 2012-03-21 2013-09-25 宇龙计算机通信科技(深圳)有限公司 终端和输入处理方法
CN102982070A (zh) * 2012-10-26 2013-03-20 北京百度网讯科技有限公司 用于输入法应用程序的词库更新方法、系统和云端服务器
CN108170294B (zh) * 2013-08-08 2021-04-16 阿里巴巴集团控股有限公司 词汇显示、字段转换方法及客户端、电子设备和计算机存储介质
CN106462579B (zh) 2014-10-15 2019-09-27 微软技术许可有限责任公司 为选定上下文构造词典
CN105069064B (zh) * 2015-07-29 2019-04-30 百度在线网络技术(北京)有限公司 词汇的获取方法及装置、推送方法及装置
KR102462365B1 (ko) * 2016-02-29 2022-11-04 삼성전자주식회사 사용자 데모그래픽 정보 및 콘텍스트 정보에 기초한 텍스트 입력 예측 방법 및 장치
CN105956158B (zh) * 2016-05-17 2019-08-09 清华大学 基于海量微博文本和用户信息的网络新词自动提取的方法
CN107544685A (zh) * 2016-06-29 2018-01-05 百度在线网络技术(北京)有限公司 信息推送方法和装置
CN106294650B (zh) * 2016-08-03 2019-08-20 北京金和网络股份有限公司 基于搜索埋点的新词挖掘方法
CN109426356B (zh) * 2017-09-01 2022-07-15 百度在线网络技术(北京)有限公司 信息输入方法和装置
CN108733650B (zh) * 2018-05-14 2022-06-07 科大讯飞股份有限公司 个性化词获取方法及装置

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1570901A (zh) * 2003-07-23 2005-01-26 台达电子工业股份有限公司 手持交互式字典查询装置及其方法
CN1629836A (zh) * 2003-12-17 2005-06-22 北京大学 学习中文新词的方法与装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7478033B2 (en) * 2004-03-16 2009-01-13 Google Inc. Systems and methods for translating Chinese pinyin to Chinese characters

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1570901A (zh) * 2003-07-23 2005-01-26 台达电子工业股份有限公司 手持交互式字典查询装置及其方法
CN1629836A (zh) * 2003-12-17 2005-06-22 北京大学 学习中文新词的方法与装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109254972A (zh) * 2018-07-23 2019-01-22 努比亚技术有限公司 一种离线命令词库更新方法、终端及计算机可读存储介质
CN109254972B (zh) * 2018-07-23 2022-09-13 上海法本信息技术有限公司 一种离线命令词库更新方法、终端及计算机可读存储介质
CN109472022A (zh) * 2018-10-15 2019-03-15 平安科技(深圳)有限公司 基于机器学习的新词识别方法及终端设备

Also Published As

Publication number Publication date
CN1924858A (zh) 2007-03-07
CN1924858B (zh) 2010-05-12

Similar Documents

Publication Publication Date Title
WO2008022581A1 (fr) Procédé et dispositif d'obtention de mots nouveaux et système et procédé de saisie
CN108304375B (zh) 一种信息识别方法及其设备、存储介质、终端
JP5647508B2 (ja) ショートテキスト通信のトピックを識別するためのシステムおよび方法
WO2008014702A1 (fr) Procédé et système d'extraction de mots nouveaux
CN109726274B (zh) 问题生成方法、装置及存储介质
CN111831802B (zh) 一种基于lda主题模型的城市领域知识检测系统及方法
KR102170206B1 (ko) 키워드와 관계 정보를 이용한 정보 검색 시스템 및 방법
US8239349B2 (en) Extracting data
CN109783631B (zh) 社区问答数据的校验方法、装置、计算机设备和存储介质
WO2007143914A1 (fr) Procédé, dispositif et système de saisie pour la création d'une base de données de fréquence de mots basée sur des informations issues du web
KR20080068825A (ko) 디스플레이를 위한 고품질 리뷰 선택
US8793120B1 (en) Behavior-driven multilingual stemming
WO2008028421A1 (fr) Procédés permettant d'obtenir une nouvelle chaîne de caractères codés, système et procédé de saisie et dispositif de génération de base de mots
CN107688616A (zh) 使实体的独特事实显现
Bykau et al. Fine-grained controversy detection in Wikipedia
CN113204953A (zh) 基于语义识别的文本匹配方法、设备及设备可读存储介质
JP5302614B2 (ja) 施設関連情報の検索データベース形成方法および施設関連情報検索システム
CN103226601A (zh) 一种图片搜索的方法和装置
CN112597768B (zh) 文本审核方法、装置、电子设备、存储介质及程序产品
CN111488453A (zh) 资源分级方法、装置、设备及存储介质
CN103064967B (zh) 一种用于建立用户二元关系库的方法与设备
JP6942759B2 (ja) 情報処理装置、プログラム及び情報処理方法
JP5179564B2 (ja) クエリセグメント位置決定装置
CN113934910A (zh) 一种自动优化、更新的主题库构建方法,及热点事件实时更新方法
CN113535883A (zh) 商业场所实体链接方法、系统、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07800906

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 07800906

Country of ref document: EP

Kind code of ref document: A1