WO2018041036A1 - 关键词的查找方法、装置及终端 - Google Patents
关键词的查找方法、装置及终端 Download PDFInfo
- Publication number
- WO2018041036A1 WO2018041036A1 PCT/CN2017/099044 CN2017099044W WO2018041036A1 WO 2018041036 A1 WO2018041036 A1 WO 2018041036A1 CN 2017099044 W CN2017099044 W CN 2017099044W WO 2018041036 A1 WO2018041036 A1 WO 2018041036A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- keyword
- character
- text
- length
- string
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
Definitions
- the present disclosure relates to the field of communications, for example, to a method, apparatus, and terminal for searching for a keyword.
- the recognition technology of text information content includes partial matching and whole word matching.
- the partial matching method that is, matching all the texts containing the string, will not be missed, and there may be too many matching situations, and a large amount of interference information may appear in the matching result.
- search text "Method and apparatus for longest prefix matching based (on) a tree"
- if you want to match the English word "on” use string matching, "longest” with "on” string And "(on)” are recognized.
- the more characters you have in the search text the fewer the number of characters in the string you need to match, and the more interference information.
- a space can be used as a word segmentation standard, but a lot of symbols are used to separate words. This method is easy to omit identification, and the whole word matching method may not recognize "(on)” in the above example.
- a method for finding a keyword including:
- the one or more specified character strings are the target keywords.
- the method before the acquiring one or more specified strings of the same text as the keyword in the one or more character strings having the same length as the keyword, the method further includes:
- the searched text is divided into the one or more characters having the same length as the keyword according to the length of the keyword and the length of the retrieved text.
- the one or more texts of the same length as the keyword are calculated.
- the searched text is divided into the one or more character strings having the same length as the keyword, including:
- the string is discarded and the interception is ended.
- determining whether the adjacent characters of the one or more specified strings belong to the value range of the keyword including:
- the adjacent character is a character adjacent to a tail of the specified character string
- the adjacent character is a character adjacent to the head of the specified character string.
- the method further includes:
- the one or more specified character strings are interference keywords.
- the text includes a mapped value of a string.
- the mapped value of the string includes: a hash value of the string or a character encoded value in the string.
- the specified character string is a character string having the same text as the keyword.
- a keyword finding device comprising:
- the determining module is configured to determine whether the adjacent character of the one or more specified character strings belongs to the value range of the keyword, wherein the value range of the keyword is a character type to which the character in the keyword belongs All characters contained in ;
- the determining module is configured to determine that the one or more specified character strings are target keywords if the adjacent characters of the one or more specified character strings do not belong to the value range of the keyword.
- the device further includes:
- a processing module configured to determine, before the acquiring module acquires one or more specified character strings that are the same as the text of the keyword in one or more character strings having the same length as the keyword a range of values, and calculating the length of the keyword and the length of the search text in which the keyword is located;
- a segmentation module configured to divide the search text into the one or more and the keyword according to a length of the keyword and a length of the retrieved text, starting from a first character of the retrieved text a string of the same length;
- a calculation module configured to calculate the one or more texts of the character string having the same length as the keyword.
- the segmentation module is further configured to: according to the first character of the retrieved text, sequentially cut a character string having the same length as the keyword according to a predetermined step size; and when the length of the intercepted string When the length of the keyword is less than, the string is discarded and the interception is ended.
- a terminal comprising:
- a processor configured to obtain one or more specified character strings identical to the string text of the keyword in one or more character strings having the same length as the keyword; determining the one or more specified strings Whether the adjacent character belongs to the value range of the keyword, wherein the value range of the keyword is all characters included in the character category to which the character in the keyword belongs; in the one or more specified characters If the adjacent characters of the string do not belong to the value range of the keyword, determining that the one or more specified character strings are target keywords;
- An output device configured to display or output the target keyword.
- the terminal further includes:
- the input device is configured to receive the input parameter, determine a value range of the keyword, and calculate a length of the keyword and a length of the search text in which the keyword is located;
- the processor is further configured to, according to the first character of the retrieved text, divide the searched text into the one or more contexts according to the length of the keyword and the length of the retrieved text.
- the keyword has a string of the same length; and the text of the one or more strings having the same length as the keyword is calculated.
- a computer readable storage medium is set to store program code set to perform the following steps:
- the one or more specified character strings are the target keywords.
- the storage medium is further arranged to store program code arranged to perform the following steps:
- the searched text is segmented into the same length as the one or more keywords according to the length of the keyword and the length of the retrieved text. String;
- a text of the one or more character strings having the same length as the keyword is calculated.
- FIG. 1 is a hardware structural diagram of a terminal of a method for searching for a keyword according to an embodiment
- FIG. 2 is a flow chart of a method for searching for a keyword according to an embodiment
- FIG. 3 is a flow chart of another method for searching for a keyword according to an embodiment
- FIG. 4 is a structural diagram of a keyword search device according to an embodiment
- FIG. 5 is a structural diagram of another keyword search device according to an embodiment
- FIG. 6 is a structural diagram of a terminal according to an embodiment
- Fig. 7 is a structural diagram of another terminal of an embodiment.
- the symbols (separators) used to separate words can be replaced with spaces, but the types of separators are various and are not limited to punctuation.
- Content recognition can be circumvented when numbers, other languages (such as English letters), or invisible characters are used as separators.
- An identification technique that uses a separator to replace it with a space may result in a missed recognition.
- FIG. 1 is a hardware configuration diagram of a terminal that performs a search method of a keyword.
- the terminal may be a mobile terminal.
- terminal 10 may include one or more (only one shown) processor 102 (processor 102 may include a Microcontroller Unit (MCU) or a programmable logic device (Field Programmable Gate Array).
- MCU Microcontroller Unit
- FPGA Field Programmable Gate Array
- a processing device such as an FPGA
- Memory 104 provided to store data
- transmission device 106 having a communication function.
- Terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than that shown in FIG.
- the memory 104 may be configured as a software program and a module for storing application software, such as program instructions or modules corresponding to a search method of keywords in the following embodiments, and the processor 102 executes by executing a software program or module stored in the memory 104.
- application software such as program instructions or modules corresponding to a search method of keywords in the following embodiments
- the processor 102 executes by executing a software program or module stored in the memory 104.
- a variety of functional applications and data processing implement the methods in the following embodiments.
- Memory 104 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
- memory 104 can include memory remotely located relative to processor 102, which can be connected to terminal 10 over a network.
- the above network may include the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
- Transmission device 106 can be arranged to receive or transmit data via a network.
- the above network can A wireless network provided by a communication provider of the terminal 10 is included.
- the transmission device 106 includes a network interface controller (NIC), and the NIC can be connected to other network devices through the base station to implement communication between the transmission device 106 and the Internet.
- the transmission device 106 can be a radio frequency (RF) module, and the RF module can communicate with the Internet wirelessly.
- RF radio frequency
- FIG. 2 is a flowchart of a method for searching for a keyword according to the embodiment. As shown in FIG. 2, the process includes the following steps.
- step 202 one or more specified character strings identical to the text of the keyword are obtained in one or more character strings having the same length as the keyword.
- the length of the keyword may be the number of characters included in the keyword.
- the text includes a mapped value of a string.
- the mapped value of the string includes a hash value of the string or a character encoding value of each character in the string.
- the character encoding value may be a Universal two byte coded character set (UCS2) encoding after character conversion of a character string, or an American Standard Code for Information Interchange (ASCII). .
- UCS2 Universal two byte coded character set
- ASCII American Standard Code for Information Interchange
- the specified string is a string that is the same as the text of the keyword.
- the same string as the text of the searched keyword "on” string includes: “on” and “base on” in “longest” "on” in the middle. Therefore, the above two “on” can be the specified string.
- step 204 it is determined whether the adjacent characters of the one or more specified character strings belong to the value range of the keyword.
- the adjacent character is one or more characters preceding the first character in the specified string and one or more characters after the last character in the specified string.
- the keyword is "key”
- the adjacent characters of the second "key” may include: “4", "b”, “24”, “bo”, "551024", and "board1".
- the adjacent character is adjacent to a tail of the specified character string.
- the adjacent character is adjacent to a head of the specified character string character.
- the value range of the keyword refers to all characters included in the character category to which the character in the keyword belongs.
- the value range of the keyword is 26 English letters in “az”
- the value of the keyword is The range is "0,1,2,3,4,5,6,7,8,9” and when the keyword to be found is "m2"
- the range of values of the keyword includes “az” 26 English letters, and "0,1,2,3,4,5,6,7,8,9".
- step 206 if the adjacent characters of the one or more specified character strings do not belong to the value range of the keyword, the one or more specified character strings are determined as target keywords.
- FIG. 3 is a flowchart of a method for searching for a keyword according to the embodiment. As shown in FIG. 3, on the basis of the foregoing embodiment, one or more character strings having the same length as the keyword are acquired. The method further includes the following steps before the text of the keyword is the same as the one or more specified strings.
- step 302 a range of values of the keyword is determined, and the length of the keyword and the length of the retrieved text in which the keyword is located are calculated.
- the text of the keyword is also obtained.
- step 304 starting from the first character of the retrieved text, the searched text is divided into one or more of the same as the keyword according to the length of the keyword and the length of the retrieved text. A string of length.
- the above predetermined step size can be set to 1 by default.
- step 306 the one or more texts of the character string having the same length as the keyword are calculated.
- the search text "app4apple” uses the number 4 as a separator to find the English word "app” from the retrieved text.
- the length of the searched text and the length of the keyword are respectively calculated.
- the length of the searched text in this scene (the number of characters in the searched text) is 9, and the length of the keyword (the number of characters in the keyword) is 3.
- the above searched text is segmented and divided into “app”, “pp4", “p4a”, “4ap”, “app”, “ppl”, “ple”.
- the hash value of the cut string is calculated and compared with the hash value of the keyword. If the hash value of the cut string is equal to the hash value of the keyword, note the position at which the string begins.
- Table 1 is a table of locations corresponding to the search text and characters, as shown in Table 1,
- Position 1 is the start position of the search text, and the character (adjacent character) adjacent to the tail of the specified character string is "4".
- the adjacent character of the keyword start position of position 5 is "4", and the end position is adjacent to "1".
- the keyword at position 1 is the target keyword.
- the keyword at position 5 is the interference keyword.
- the above method avoids the interference of non-empty separators and accurately finds the target keywords.
- Table 2 is a code table corresponding to the characters in the search text and the search text.
- the target string is found at position 2 and position 8.
- the UCS2 codes of the adjacent characters at position 2 and position 8 are 0x6211, 0x82F9, 0x679C, 0x0072, respectively, and the UCS2 encoding of the adjacent characters is UCS2 encoding range "0x0041" corresponding to the keyword value range "az, AZ". -0x007A" for comparison, 0x6211, 0x82F9 are not in the range If the string at position 2 is the target string and 0x0072 is within the range, the string at position 8 is considered to be the interference string. Among them, "0x” means hexadecimal.
- the above method can solve the problem that the recognition process existing in the process of searching for the keyword specified by the user is complicated and missed, and reduces the omission rate when searching for keywords and avoids the influence of non-target keywords.
- the method of the foregoing embodiment may be implemented by means of software plus a general hardware platform, or may be implemented by hardware.
- the above technical solution may be embodied in the form of a software product stored in a storage medium (such as a read-only memory (ROM), a random access memory (RAM), a magnetic disk or
- a storage medium such as a read-only memory (ROM), a random access memory (RAM), a magnetic disk or
- the optical disc includes one or more instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device) to perform the method of any of the above embodiments.
- This embodiment provides a keyword search device, which can perform the method in any of the above embodiments.
- the term "module” can implement a combination of software and hardware, software, or hardware for a predetermined function.
- the apparatus described in the following embodiments may be implemented in software, or in hardware, or a combination of software and hardware.
- FIG. 4 is a structural diagram of a keyword search device according to the embodiment. As shown in FIG. 4, the device includes an acquisition module 42, a determination module 44, and a determination module 46.
- the obtaining module 42 is arranged to obtain one or more specified character strings identical to the text of the keyword in one or more character strings having the same length as the keyword.
- the determining module 44 is configured to determine whether the adjacent characters of the one or more specified character strings belong to the value range of the keyword, wherein the value range of the keyword is a character type to which the characters in the keyword belong All characters contained in .
- the determining module 46 is configured to determine that the one or more specified character strings are target keywords if the adjacent characters of the one or more specified character strings do not belong to the value range of the keyword.
- FIG. 5 is a structural diagram of a keyword search device according to the embodiment. As shown in FIG. 5, the device includes: a processing module 52, a segmentation module 54 and Calculation module 56.
- the processing module 52 is configured to: before the obtaining module 42 acquires one or more specified strings having the same text as the keyword in one or more character strings having the same length as the keyword, determining the The value range of the keyword, and the length of the keyword and the length of the search text where the keyword is located.
- the segmentation module 54 is configured to, according to the first character of the retrieved text, divide the searched text into the one or more of the same length as the keyword according to the length of the keyword and the length of the retrieved text. String.
- the calculation module 56 is arranged to calculate the one or more texts of the character string having the same length as the keyword.
- Each of the above modules can be implemented by software or hardware.
- the above modules are all located in the same processor; or, the above multiple modules are respectively located in different processors in a combined form.
- FIG. 6 is a structural diagram of a terminal provided in this embodiment. As shown in FIG. 6, the terminal includes a processor 62 and an output device 64.
- the processor 62 is configured to obtain one or more specified character strings identical to the text of the keyword in one or more character strings having the same length as the keyword; determining the phase of the one or more specified character strings Whether the adjacent character belongs to the range of values of the keyword, wherein the value range of the keyword is all characters included in the character category to which the character in the keyword belongs; in the one or more specified strings In a case where the adjacent character does not belong to the value range of the keyword, the one or more specified character strings are determined as the target keyword.
- the output device 64 is arranged to display or output the target keyword.
- the output device 64 can include a display screen and an interface disposed on the terminal.
- FIG. 7 is a structural diagram of a terminal according to this embodiment. As shown in FIG. 7, the device includes an input device 72 in addition to all of the modules shown in FIG.
- the input device 72 is arranged to receive a parameter input by the user, determine a range of values of the keyword, and calculate a length of the keyword and a length of the search text in which the keyword is located.
- the input device 72 can include a display screen of a user interface (UI) and input buttons.
- UI user interface
- the processor 62 may be further configured to, according to the first character of the retrieved text, divide the searched text into the one or more keywords according to the length of the keyword and the length of the retrieved text. a string of the same length; and calculating the one or more of the same length as the keyword The text of the string.
- the computer readable storage medium may be configured to store program code configured to perform the following steps:
- the one or more specified character strings are determined as the target keywords.
- the storage medium is further arranged to store program code arranged to perform the following steps:
- the searched text is divided into the one or more character strings having the same length as the keyword according to the length of the keyword and the length of the retrieved text;
- the one or more texts of the character string having the same length as the keyword are calculated.
- the computer readable storage medium may include: a USB flash drive, a read-only memory (ROM), a random access memory (RAM), a mobile hard disk, and a magnetic A medium such as a disc or a disc that can store program code.
- the plurality of modules or steps described above may be implemented by a general purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
- multiple modules or multiple steps may be implemented with program code executable by a computing device, and multiple modules or multiple steps may be stored in the storage device for execution by the computing device.
- the steps shown or described may be performed in a different order than in the above-described embodiments, or multiple modules or multiple steps may be separately fabricated into multiple integrated circuit modules, or multiple modules or steps may be fabricated. Into a single integrated circuit module.
- Keyword search method, device and device can solve the problem of finding a specified keyword
- the recognition process is complicated and misses many problems, which reduces the omission rate of keyword search and avoids the influence of non-target keywords.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种关键词的查找方法、装置及终端。该关键词的查找方法包括:在一个或多个与关键词具有相同长度的字符串中,获取与所述关键词的文本相同的一个或多个指定字符串(202);判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围(204),其中,所述关键词的取值范围为关键词中的字符所属的字符种类中包含的所有字符;以及在所述一个或多个指定字符串的相邻字符不属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为目标关键词(206)。
Description
本公开涉及通信领域,例如,涉及一种关键词的查找方法、装置及终端。
相关技术中,文本信息内容的识别技术包括部分匹配和全词匹配。其中,部分匹配方法,即匹配所有包含该字符串的文本,不会漏识别,可能出现匹配过多的情况,匹配结果中可能出现大量的干扰信息。比如针对下面的检索文本:“Method and apparatus for longest prefix matching based(on)a tree”,如果要匹配其中的英文单词“on”,采用字符串匹配,带有“on”字符串的“longest”和“(on)”都被识别出来。检索文本包含的字符越多,需要匹配的字符串中字符数越少,干扰信息就越多。全词匹配方法中,可以用空格作为分词标准,但是用来分隔单词的符号很多,这种方法容易遗漏识别,用全词匹配方法可能识别不了比如上面例子中的“(on)”。
发明内容
一种关键词的查找方法,包括:
在一个或多个与所述关键词具有相同长度的字符串中,获取与所述关键词的文本相同的一个或多个指定字符串;
判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围,其中,所述关键词的取值范围为所述关键词中的字符所属的字符种类中包含的所有字符;以及
在所述一个或多个指定字符串的相邻字符不属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为目标关键词。
可选的,在一个或多个与所述关键词具有相同长度的字符串中,获取与所述关键词的文本相同的一个或多个指定字符串之前,所述方法还包括:
确定所述关键词的取值范围,并计算所述关键词的长度以及所述关键词所
在的检索文本的长度;
从所述检索文本的首个字符开始,依据所述关键词的长度以及所述检索文本的长度,将所述检索文本切分为所述一个或多个与所述关键词具有相同长度的字符串;以及
计算所述一个或多个与关键词的具有相同长度相同的字符串的文本。
可选的,将所述检索文本切分为所述一个或多个与所述关键词具有相同长度的字符串,包括:
从所述检索文本的首个字符开始,按照预定步长,依次截取与所述关键词具有相同长度的字符串;以及
当截取的字符串的长度小于所述关键词的长度时,抛弃该字符串并结束截取。
可选的,判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围,包括:
在所述关键词位于所述一个或多个与关键词具有相同长度的字符串所在的检索文本的头部时,所述相邻字符为与所述指定字符串的尾部相邻的字符;以及
在所述关键词位于所述一个或多个与关键词具有相同长度的字符串所在的检索文本的尾部时,所述相邻字符为与所述指定字符串的头部相邻的字符。
可选的,所述方法还包括:
在所述一个或多个指定字符串的相邻字符属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为干扰关键词。
可选的,所述文本包括字符串的映射值。
可选的,所述字符串的映射值包括:字符串的哈希值或者字符串中字符编码值。
可选的,所述指定字符串为与所述关键词具有相同文本的字符串。
一种关键词的查找装置,包括:
获取模块,设置为在一个或多个与所述关键词具有相同长度的字符串中获
取与所述关键词的文本相同的一个或多个指定字符串;
判断模块,设置为判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围,其中,所述关键词的取值范围为关键词中的字符所属的字符种类中包含的所有字符;以及
确定模块,设置为在所述一个或多个指定字符串的相邻字符不属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为目标关键词。
可选的,所述装置还包括:
处理模块,设置为所述获取模块在一个或多个与所述关键词具有相同长度的字符串中获取与所述关键词的文本相同的一个或多个指定字符串之前,确定所述关键词的取值范围,并计算所述关键词的长度以及所述关键词所在的检索文本的长度;
切分模块,设置为从所述检索文本的首个字符开始,依据所述关键词的长度以及检索文本的长度,将所述检索文本切分为所述一个或多个与所述关键词具有相同长度的字符串;以及
计算模块,设置为计算所述一个或多个与所述关键词具有相同长度的字符串的文本。
可选的,所述切分模块还设置为:从所述检索文本的首个字符开始,按照预定步长,依次截取与所述关键词具有相同长度的字符串;以及当截取的字符串长度小于所述关键词的长度时,抛弃该字符串并结束截取。
一种终端,包括:
处理器,设置为在一个或多个与关键词具有相同长度的字符串中获取与所述关键词的字符串文本相同的一个或多个指定字符串;判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围,其中,所述关键词的取值范围为关键词中的字符所属的字符种类中包含的所有字符;在所述一个或多个指定字符串的相邻字符不属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为目标关键词;以及
输出装置,设置为显示或输出所述目标关键词。
可选的,所述终端还包括:
输入装置,设置为接收输入的参数,确定所述关键词的取值范围,并计算所述关键词的长度以及所述关键词所在的检索文本的长度;
所述处理器,还设置为从所述检索文本的首个字符开始,依据所述关键词的长度以及所述检索文本的长度,将所述检索文本切分为所述一个或多个与所述关键词具有相同长度的字符串;以及计算所述一个或多个与所述关键词具有相同长度的字符串的文本。
一种计算机可读存储介质。该存储介质设置为存储设置为执行以下步骤的程序代码:
在一个或多个与关键词具有相同长度的字符串中获取与所述关键词的文本相同的一个或多个指定字符串;
判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围,其中,所述关键词的取值范围为关键词中的字符所属的字符种类中包含的所有字符;以及
在所述一个或多个指定字符串的相邻字符不属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为目标关键词。
可选地,存储介质还设置为存储设置为执行以下步骤的程序代码:
确定所述关键词的取值范围,并计算所述关键词的长度以及所述关键词所在的检索文本的长度;
从所述检索文本的首个字符开始,依据所述关键词的长度以及所述检索文本的长度,将所述检索文本切分为与所述一个或多个与所述关键词具有相同长度的字符串;以及
计算所述一个或多个与所述关键词具有相同长度的字符串的文本。
图1是一实施例的一种关键词的查找方法的终端的硬件结构图;
图2是一实施例的一种关键词的查找方法的流程图;
图3是一实施例的另一种关键词的查找方法的流程图;
图4是一实施例的一种关键词的查找装置的结构图;
图5是一实施例的另一种关键词的查找装置的结构图;
图6是一实施例的一种终端的结构图;以及
图7是一实施例的另一种终端的结构图。
本说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
在文本信息内容的识别技术中,可以将用来分隔单词的符号(分隔符)替换为空格,但是分隔符种类繁多,并且不仅局限于标点符号。采用数字、其他语种文字(例如英文字母)或者不可见字符作为分隔符时,可以规避内容识别。采用分隔符替换为空格的识别技术,可能出现漏识别的情况。
实施例1
本实施例提供的方法可以在移动终端以及计算机终端等运算装置中执行。以运行在终端上为例,图1是执行一种关键词的查找方法的终端的硬件结构图。所述终端可以是移动终端。如图1所示,终端10可以包括一个或多个(图中仅示出一个)处理器102(处理器102可以包括微处理器(Microcontroller Unit,MCU)或可编程逻辑器件(Field Programmable Gate Array,FPGA)等的处理装置)、设置为存储数据的存储器104以及具有通信功能的传输装置106。终端10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
存储器104可设置为存储应用软件的软件程序以及模块,如以下实施例中的关键词的查找方法对应的程序指令或模块,处理器102通过运行存储在存储器104内的软件程序或模块,从而执行多种功能应用以及数据处理,实现以下实施例中的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至终端10。上述网络可以包括互联网、企业内部网、局域网、移动通信网及其组合。
传输装置106可以设置为经由一个网络接收或者发送数据。上述的网络可
包括终端10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,NIC),NIC可通过基站与其他网络设备相连,实现传输装置106与互联网之间的通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,RF)模块,RF模块可以通过无线方式与互联网进行通讯。
在本实施例提供了一种运行于上述终端的关键词的查找方法,图2是本实施例提供的一种关键词的查找方法的流程图,如图2所示,该流程包括如下步骤。
在步骤202中,在一个或多个与关键词具有相同长度的字符串中获取与所述关键词的文本相同的一个或多个指定字符串。
其中,关键词的长度可以是关键词中包含的字符的个数。
可选地,所述文本包括字符串的映射值。
可选的,所述字符串的映射值包括字符串的哈希值或者字符串中每个字符的字符编码值。该字符编码值可以为对字符串进行字符转换后的通用2字节编码的字符集(Universal two byte coded Character Set,UCS2)编码,或者美国信息交换标准代码(American Standard Code for Information Interchange,ASCII)。
可选地,该指定字符串为与关键词的文本相同的字符串。例如,在检索文本“Method and apparatus for longest prefix matching based on a tree”中,与查找的关键词“on”字符串的文本相同的字符串包括:“longest”中的“on”以及“base on”中的“on”。因此,上述两个“on”可以为指定字符串。
在步骤204中,判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围。
可选地,相邻字符为指定字符串中第一个字符前的一个或多个字符以及指定字符串中最后一个字符后的一个或多个字符。例如,对于检索文本“233314key551024keyboard12keyword84123”中,如果关键词为“key”,存在3个指定关键词(下划线标出的部分)。以第二个“key”为例,第二个“key”的相邻字符可以包括:“4”、“b”,“24”、“bo”、“551024”以及“board1”。
可选地,所述关键词位于所述一个或多个与关键词具有相同长度相同的字符串所在的检索文本的头部时,所述相邻字符为与所述指定字符串的尾部相邻
的字符;以及所述关键词位于所述一个或多个与关键词具有相同长度的字符串所在的检索文本的尾部时,所述相邻字符为与所述指定字符串的头部相邻的字符。
可选地,关键词的取值范围是指,关键词中的字符所属的字符种类中包含的所有字符。例如,查找的关键词为“key”时,所述关键词的取值范围为“a-z”中的26个英文字母,而在查找的关键词为“120”时,所述关键词的取值范围为“0,1,2,3,4,5,6,7,8,9”而当查找的关键词为“m2”时,所述关键词的取值范围即包括“a-z”中的26个英文字母,以及“0,1,2,3,4,5,6,7,8,9”。
在步骤206中,在所述一个或多个指定字符串的相邻字符不属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为目标关键词。
图3是本实施例的一种关键词的查找方法的流程图,如图3所示,在上述实施例的基础上,在一个或多个与关键词具有相同长度的字符串中获取与所述关键词的文本相同的一个或多个指定字符串之前,该方法还包括以下步骤。
在步骤302中,确定所述关键词的取值范围,并计算关键词的长度以及所述关键词所在的检索文本的长度。
可选地,在计算关键词的长度时,还获取该关键词的文本。
在步骤304中,从所述检索文本的首个字符开始,依据所述关键词的长度以及所述检索文本的长度,将所述检索文本切分为所述一个或多个与关键词具有相同长度的字符串。
可选地,从所述检索文本的首个字符开始,按照预定步长,依次截取与所述关键词具有相同长度的字符串;当截取的字符串的长度小于所述关键词的长度时,抛弃该字符串并结束截取。上述预定步长可以默认设置为1。
例如,对于检索文本“14key2keyboard”而言,由于所要查找的关键词“key”的长度为3,预定步长为1,因此,在截取了“14k”、“4ke”、“key”、“ey2”、“y2k”、“2ke”、“key”、“eyb”、“ybo”、“boa”、“oar”以及“ard”总共12个字符串之后,截取“rd”时,由于字符串“rd”长度为2小于关键词的长度3,抛弃该字符串,并停止截取。
在步骤306中,计算所述一个或多个与关键词具有相同长度的字符串的文本。
在场景1中,检索文本“app4apple”是用数字4作为分隔符,从检索文本中找出英文单词“app”。
分别计算检索文本的长度和关键词的长度,本场景中检索文本的长度(检索文本中字符的个数)为9,关键词的长度(关键词中字符的个数)为3。
对上述检索文本进行切分并切分为“app”,“pp4”,“p4a”,“4ap”,“app”,“ppl”,“ple”。计算切割得到的字符串的哈希(hash)值,并与关键词的哈希(hash)值进行比较。若切割得到的字符串的哈希值与关键词的哈希值相等,则记下该字符串开始的位置。表1是检索文本及字符对应的位置表,如表1所示,
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
a | p | p | 4 | a | p | p | l | e |
在位置1,位置5处找到关键词。位置1在检索文本的开始位置,与指定字符串的尾部相邻的字符(相邻字符)为“4”。位置5的关键词开始位置的相邻字符为“4”,结束位置相邻为“l”。
判断字符“4”以及“l”是否在“a-z”(关键词“app”的取值范围)范围内。“4”不在“a-z”的取值范围之内,“l”在“a-z”的取值范围以内。因此,位置1处的关键词是目标关键词。位置5处的关键词为干扰关键词。
上述方法规避了非空隔符的干扰,精确地找到了目标关键词。
在场景2,即中英文混合的情况下,例如检索文本为“我love苹果lover”。目标关键词为英文“love”。
确定关键词的取值范围“a-z,A-Z”。将关键词以及文本转换为UCS2编码,计算字符长度。表2是检索文本及检索文本中字符对应的编码表。
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
我 | l | o | v | e | 苹 | 果 | l | o | v | e | r |
6211 | 006C | 006F | 0076 | 0065 | 82F9 | 679C | 006C | 006F | 0076 | 0065 | 0072 |
在位置2处和位置8处找到目标字符串。
将位置2处和位置8处前后相邻字符的UCS2编码分别为0x6211、0x82F9,0x679C、0x0072,上述相邻字符的UCS2编码与关键词取值范围“a-z,A-Z”对应的UCS2编码范围“0x0041-0x007A”进行比较,0x6211、0x82F9不在该范围
内,则位置2处的字符串为目标字符串,0x0072在该范围内,则认为位置8处字符串为干扰字符串。其中,“0x”表示16进制。
上述方法能够解决查找用户指定的关键词的过程中存在的识别过程复杂且遗漏较多的问题,降低了查找关键词时的遗漏率以及避免非目标关键词的影响。
上述实施例的方法可借助软件加通用硬件平台的方式来实现,也可以通过硬件实现。上述技术方案可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或光盘)中,包括一个或多个指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备)执行上述任一实施例中的方法。
实施例2
本实施例提供了一种关键词的查找装置,该装置可以执行上述任一实施例中的方法。如以下所使用的,术语“模块”可以实现预定功能的软件和硬件的组合、软件、或硬件。以下实施例所描述的装置可以以软件来实现,也可以以硬件,或者软件和硬件的组合实现。
图4是本实施例的一种关键词的查找装置的结构图,如图4所示,该装置包括:获取模块42,判断模块44以及确定模块46。
获取模块42设置为在一个或多个与关键词具有相同长度的字符串中获取与所述关键词的文本相同的一个或多个指定字符串。
判断模块44设置为判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围,其中,所述关键词的取值范围为关键词中的字符所属的字符种类中包含的所有字符。
确定模块46设置为在所述一个或多个指定字符串的相邻字符不属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为目标关键词。
图5是本实施例提供的一种关键词的查找装置的结构图,如图5所示,该装置除包括图4所示的所有模块外,还包括:处理模块52,切分模块54以及计算模块56。
处理模块52设置为获取模块42在一个或多个与关键词具有相同长度的字符串中获取与所述关键词的文本相同的一个或多个指定字符串之前,确定所述
关键词的取值范围,并计算关键词的长度以及所述关键词所在的检索文本的长度。
切分模块54设置为从所述检索文本的首个字符开始,依据所述关键词的长度以及检索文本的长度,将所述检索文本切分为所述一个或多个与关键词具有相同长度的字符串。
计算模块56设置为计算所述一个或多个与关键词具有相同长度的字符串的文本。
上述每个模块是可以通过软件或硬件来实现的。每个模块通过硬件实现时,上述模块均位于同一处理器中;或者,上述多个模块以组合的形式分别位于不同的处理器中。
实施例3
本实施例提供了一种终端,图6是本实施例提供的一种终端的结构图。如图6所示,该终端包括:处理器62以及输出装置64。
处理器62设置为在一个或多个与关键词具有相同长度的字符串中获取与所述关键词的文本相同的一个或多个指定字符串;判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围,其中,所述关键词的取值范围为关键词中的字符所属的字符种类中包含的所有字符;在所述一个或多个指定字符串的相邻字符不属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为目标关键词。
输出装置64设置为显示或输出所述目标关键词。
该输出装置64可以包括显示屏以及终端上设置的接口。
图7是本实施例提供的一种终端的结构图。如图7所示,该装置除包括图6所示的所有模块外,还包括:输入装置72。
输入装置72设置为接收用户输入的参数,确定所述关键词的取值范围,并计算关键词的长度以及所述关键词所在的检索文本的长度。
输入装置72可以包括用户界面(User Interface,UI)的显示屏以及输入按键。
所述处理器62还可以设置为从所述检索文本的首个字符开始,依据所述关键词的长度以及检索文本的长度,将所述检索文本切分为所述一个或多个与关键词具有相同长度的字符串;以及计算所述一个或多个与关键词具有相同长度
的字符串的文本。
实施例4
本实施例提供了一种计算机可读存储介质。可选地,在本实施例中,上述计算机可读存储介质可以被设置为存储设置为执行以下步骤的程序代码:
在一个或多个与关键词具有相同长度的字符串中获取与所述关键词的文本相同的一个或多个指定字符串;
判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围,其中,所述关键词的取值范围为关键词中的字符所属的字符种类中包含的所有字符;以及
在述一个或多个指定字符串的相邻字符不属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为目标关键词。
可选地,存储介质还设置为存储设置为执行以下步骤的程序代码:
确定所述关键词的取值范围,并计算关键词的长度以及所述关键词所在的检索文本的长度;
从所述检索文本的首个字符开始,依据所述关键词的长度以及检索文本的长度,将所述检索文本切分为所述一个或多个与关键词具有相同长度的字符串;以及
计算所述一个或多个与关键词具有相同长度的字符串的文本。
可选地,在本实施例中,上述计算机可读存储介质可以包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等所种可以存储程序代码的介质。
上述的多个模块或多个步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上。可选地,多个模块或多个步骤可以用计算装置可执行的程序代码来实现,可以将多个模块或多个步骤存储在存储装置中由计算装置来执行。在一些情况下,可以以不同于上述实施例中的顺序执行所示出或描述的步骤,或者将多个模块或多个步骤分别制作成多个集成电路模块,或者将多个模块或步骤制作成单个集成电路模块。
关键词的查找方法、装置及设备,能够解决查找指定的关键词过程中存在
的识别过程复杂且遗漏多的问题,降低了关键词查找时的遗漏率以及避免了非目标关键词的影响。
Claims (13)
- 一种关键词的查找方法,包括:在一个或多个与所述关键词具有相同长度的字符串中,获取与所述关键词的文本相同的一个或多个指定字符串;判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围,其中,所述关键词的取值范围为所述关键词中的字符所属的字符种类中包含的所有字符;以及在所述一个或多个指定字符串的相邻字符不属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为目标关键词。
- 根据权利要求1所述的方法,在一个或多个与所述关键词具有相同长度的字符串中,获取与所述关键词的文本相同的一个或多个指定字符串之前,所述方法还包括:确定所述关键词的取值范围,并计算所述关键词的长度以及所述关键词所在的检索文本的长度;从所述检索文本的首个字符开始,依据所述关键词的长度以及所述检索文本的长度,将所述检索文本切分为所述一个或多个与所述关键词具有相同长度的字符串;以及计算所述一个或多个与关键词具有相同长度的字符串的文本。
- 根据权利要求2所述的方法,其中,将所述检索文本切分为所述一个或多个与所述关键词具有相同长度的字符串,包括:从所述检索文本的首个字符开始,按照预定步长,依次截取与所述关键词具有相同长度的字符串;以及当截取的字符串的长度小于所述关键词的长度时,抛弃该字符串并结束截 取。
- 根据权利要求1所述的方法,其中,判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围,包括:在所述关键词位于所述一个或多个与关键词具有相同长度的字符串所在的检索文本的头部时,所述相邻字符为与所述指定字符串的尾部相邻的字符;以及在所述关键词位于所述一个或多个与关键词具有相同长度的字符串所在的检索文本的尾部时,所述相邻字符为与所述指定字符串的头部相邻的字符。
- 根据权利要求1所述的方法,还包括:在所述一个或多个指定字符串的相邻字符属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为干扰关键词。
- 根据权利要求1所述的方法,其中,所述文本包括字符串的映射值。
- 根据权利要求7所述的方法,其中,所述字符串的映射值包括:字符串的哈希值或者字符串中字符编码值。
- 一种关键词的查找装置,包括:获取模块,设置为在一个或多个与所述关键词具有相同长度的字符串中获取与所述关键词的文本相同的一个或多个指定字符串;判断模块,设置为判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围,其中,所述关键词的取值范围为关键词中的字符所属的字符种类中包含的所有字符;以及确定模块,设置为在所述一个或多个指定字符串的相邻字符不属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为目标关键词。
- 根据权利要求8所述的装置,还包括:处理模块,设置为所述获取模块在一个或多个与所述关键词具有相同长度的字符串中获取与所述关键词的文本相同的一个或多个指定字符串之前,确定所述关键词的取值范围,并计算所述关键词的长度以及所述关键词所在的检索文本的长度;切分模块,设置为从所述检索文本的首个字符开始,依据所述关键词的长度以及所述检索文本的长度,将所述检索文本切分为所述一个或多个与所述关键词具有相同长度的字符串;以及计算模块,设置为计算所述一个或多个与所述关键词具有相同长度的字符串的文本。
- 根据权利要求9所述的装置,其中,所述切分模块还设置为:从所述检索文本的首个字符开始,按照预定步长,依次截取与所述关键词具有相同长度的字符串;以及当截取的字符串长度小于所述关键词的长度时,抛弃该字符串并结束截取。
- 一种终端,包括:处理器,设置为在一个或多个与关键词具有相同长度的字符串中获取与所述关键词的文本相同的一个或多个指定字符串;判断所述一个或多个指定字符串的相邻字符是否属于所述关键词的取值范围,其中,所述关键词的取值范围为关键词中的字符所属的字符种类中包含的所有字符;在所述一个或多个指定字符串的相邻字符不属于所述关键词的取值范围的情况下,确定所述一个或多个指定字符串为目标关键词;以及输出装置,设置为显示或输出所述目标关键词。
- 根据权利要求11所述的终端,还包括:输入装置,设置为接收输入的参数,确定所述关键词的取值范围,并计算所述关键词的长度以及所述关键词所在的检索文本的长度;所述处理器,还设置为从所述检索文本的首个字符开始,依据所述关键词的长度以及所述检索文本的长度,将所述检索文本切分为所述一个或多个与所述关键词具有相同长度的字符串;以及计算所述一个或多个与所述关键词具有相同长度的字符串的文本。
- 一种计算机可读存储介质,设置为存储设置为执行权利要求1-6中任一项的方法的程序代码。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610784659.X | 2016-08-29 | ||
CN201610784659.XA CN107798004B (zh) | 2016-08-29 | 2016-08-29 | 关键词查找方法、装置及终端 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018041036A1 true WO2018041036A1 (zh) | 2018-03-08 |
Family
ID=61300028
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/099044 WO2018041036A1 (zh) | 2016-08-29 | 2017-08-25 | 关键词的查找方法、装置及终端 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107798004B (zh) |
WO (1) | WO2018041036A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109783607A (zh) * | 2018-12-19 | 2019-05-21 | 南京莱斯信息技术股份有限公司 | 一种在任意文本中匹配识别海量关键词的方法 |
CN111369980A (zh) * | 2020-02-27 | 2020-07-03 | 网易有道信息技术(北京)有限公司江苏分公司 | 语音检测方法、装置、电子设备及存储介质 |
CN111753047A (zh) * | 2020-05-19 | 2020-10-09 | 北京捷通华声科技股份有限公司 | 一种文本处理方法及装置 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116092226A (zh) * | 2022-12-05 | 2023-05-09 | 北京声智科技有限公司 | 一种语音开锁方法、装置、设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6470334B1 (en) * | 1999-01-07 | 2002-10-22 | Fuji Xerox Co., Ltd. | Document retrieval apparatus |
CN1403959A (zh) * | 2001-09-07 | 2003-03-19 | 联想(北京)有限公司 | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 |
CN101149739A (zh) * | 2007-08-24 | 2008-03-26 | 中国科学院计算技术研究所 | 一种面向互联网的有意义串的挖掘方法和系统 |
CN102890690A (zh) * | 2011-07-22 | 2013-01-23 | 中兴通讯股份有限公司 | 目标信息搜索方法和装置 |
CN103336761A (zh) * | 2013-05-14 | 2013-10-02 | 成都网安科技发展有限公司 | 基于动态划分与语义加权的干扰过滤匹配算法 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184245B (zh) * | 2011-05-18 | 2013-03-06 | 华北电力大学 | 一种海量文本数据关键词的快速查找方法 |
CN102799600B (zh) * | 2012-04-10 | 2017-04-05 | 成都网安科技发展有限公司 | 一种基于编码关联的多模式匹配算法及系统 |
CN104537116B (zh) * | 2015-01-23 | 2017-10-31 | 浙江大学 | 一种基于标签的图书搜索方法 |
WO2016187888A1 (zh) * | 2015-05-28 | 2016-12-01 | 北京旷视科技有限公司 | 基于字符识别的关键词通知方法及设备、计算机程序产品 |
-
2016
- 2016-08-29 CN CN201610784659.XA patent/CN107798004B/zh active Active
-
2017
- 2017-08-25 WO PCT/CN2017/099044 patent/WO2018041036A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6470334B1 (en) * | 1999-01-07 | 2002-10-22 | Fuji Xerox Co., Ltd. | Document retrieval apparatus |
CN1403959A (zh) * | 2001-09-07 | 2003-03-19 | 联想(北京)有限公司 | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 |
CN101149739A (zh) * | 2007-08-24 | 2008-03-26 | 中国科学院计算技术研究所 | 一种面向互联网的有意义串的挖掘方法和系统 |
CN102890690A (zh) * | 2011-07-22 | 2013-01-23 | 中兴通讯股份有限公司 | 目标信息搜索方法和装置 |
CN103336761A (zh) * | 2013-05-14 | 2013-10-02 | 成都网安科技发展有限公司 | 基于动态划分与语义加权的干扰过滤匹配算法 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109783607A (zh) * | 2018-12-19 | 2019-05-21 | 南京莱斯信息技术股份有限公司 | 一种在任意文本中匹配识别海量关键词的方法 |
CN109783607B (zh) * | 2018-12-19 | 2023-04-25 | 南京莱斯信息技术股份有限公司 | 一种在任意文本中匹配识别海量关键词的方法 |
CN111369980A (zh) * | 2020-02-27 | 2020-07-03 | 网易有道信息技术(北京)有限公司江苏分公司 | 语音检测方法、装置、电子设备及存储介质 |
CN111369980B (zh) * | 2020-02-27 | 2023-06-02 | 网易有道信息技术(江苏)有限公司 | 语音检测方法、装置、电子设备及存储介质 |
CN111753047A (zh) * | 2020-05-19 | 2020-10-09 | 北京捷通华声科技股份有限公司 | 一种文本处理方法及装置 |
CN111753047B (zh) * | 2020-05-19 | 2024-06-07 | 北京捷通华声科技股份有限公司 | 一种文本处理方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN107798004A (zh) | 2018-03-13 |
CN107798004B (zh) | 2022-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10230668B2 (en) | Information replying method and apparatus | |
US11544459B2 (en) | Method and apparatus for determining feature words and server | |
CN109670163B (zh) | 信息识别方法、信息推荐方法、模板构建方法及计算设备 | |
WO2018041036A1 (zh) | 关键词的查找方法、装置及终端 | |
US10796077B2 (en) | Rule matching method and device | |
CN104866478B (zh) | 恶意文本的检测识别方法及装置 | |
JP2015179497A (ja) | 入力方法及びシステム | |
CN111597433B (zh) | 资源搜索方法、装置以及电子设备 | |
JP6161227B2 (ja) | 入力リソースプッシュ方法、システム、コンピューター記憶媒体及びデバイス | |
CN108304368B (zh) | 文本信息的类型识别方法和装置及存储介质和处理器 | |
US10546009B2 (en) | System for mapping a set of related strings on an ontology with a global submodular function | |
CN106156120B (zh) | 对字符串进行分类的方法和装置 | |
WO2016095645A1 (zh) | 笔画输入方法、装置和系统 | |
CN111506726B (zh) | 基于词性编码的短文本聚类方法、装置及计算机设备 | |
WO2017166626A1 (zh) | 归一化方法、装置和电子设备 | |
WO2017101541A1 (zh) | 文本聚类方法、装置及计算设备 | |
WO2020103447A1 (zh) | 视频信息链式存储方法、装置、计算机设备及存储介质 | |
US9122898B2 (en) | Systems and methods for processing documents of unknown or unspecified format | |
CN106569989A (zh) | 一种用于短文本的去重方法及装置 | |
CN108804487A (zh) | 一种提取目标字符的方法及装置 | |
CN110245357B (zh) | 主实体识别方法和装置 | |
CN103365934A (zh) | 复杂命名实体抽取方法及装置 | |
US9672819B2 (en) | Linguistic model database for linguistic recognition, linguistic recognition device and linguistic recognition method, and linguistic recognition system | |
CN112784596A (zh) | 一种识别敏感词的方法和装置 | |
CN107329946B (zh) | 相似度的计算方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17845338 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17845338 Country of ref document: EP Kind code of ref document: A1 |