CN106649427B - Information identification method and device - Google Patents

Information identification method and device Download PDF

Info

Publication number
CN106649427B
CN106649427B CN201610643277.5A CN201610643277A CN106649427B CN 106649427 B CN106649427 B CN 106649427B CN 201610643277 A CN201610643277 A CN 201610643277A CN 106649427 B CN106649427 B CN 106649427B
Authority
CN
China
Prior art keywords
keyword
notification information
target
information
combined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610643277.5A
Other languages
Chinese (zh)
Other versions
CN106649427A (en
Inventor
熊胜
吴勤华
杨晶蕾
谢纪鹏
徐云恒
江为强
张仲琨
冯文仲
聂志锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Hubei Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Hubei Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Hubei Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201610643277.5A priority Critical patent/CN106649427B/en
Publication of CN106649427A publication Critical patent/CN106649427A/en
Application granted granted Critical
Publication of CN106649427B publication Critical patent/CN106649427B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses an information identification method, which comprises the following steps: receiving notification information; after determining that the notification information comprises a target single keyword in a keyword set, acquiring a target combined keyword comprising the target single keyword in the keyword set; matching the notification information with the target combined keyword to obtain a matching result; and when the matching result shows that the notification information comprises the target combined keyword, determining the notification information as preset type information. The embodiment of the invention also discloses an information identification device.

Description

Information identification method and device
Technical Field
The present invention relates to the field of information processing, and in particular, to a method and an apparatus for information identification.
Background
With the development of scientific technology, people are increasingly concerned with information identification technology, for example, a terminal can filter junk information such as promotion and fraud through the information identification technology so as to prevent users from being disturbed.
The existing information recognition technology needs to firstly recognize a target single keyword included in notification information. After the target single keyword is identified, in order to improve the identification accuracy and prevent misjudgment, a target combined keyword needs to be further identified according to the identified target single keyword, wherein the target combined keyword is formed by combining the target single keywords. When the fact that the target combined keyword is contained in the notification information is recognized, the notification information can be determined to be preset type information.
However, in the process of identifying the target combined keyword, all the combined keywords need to be traversed. In practical application, as the diversity of the notification information to be identified increases day by day, the number of the combined keywords will increase rapidly, so that the information identification technology has higher time complexity, and the efficiency of the information identification technology is reduced.
Disclosure of Invention
In order to solve the foregoing technical problems, embodiments of the present invention are directed to providing an information identification method and apparatus, so as to reduce time complexity of information identification and improve efficiency of information identification.
The technical scheme of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides an information identification method, where the method includes: receiving notification information; after determining that the notification information comprises a target single keyword in a keyword set, acquiring a target combined keyword comprising the target single keyword in the keyword set; matching the notification information with the target combined keyword to obtain a matching result; and when the matching result shows that the notification information comprises the target combined keyword, determining the notification information as preset type information.
Further, the obtaining of the target combined keyword including the target single keyword in the keyword set includes: and acquiring the target combined keyword in the keyword set according to the position information corresponding to the target single keyword.
Further, before the determining that the notification information includes the target single keyword in the keyword set, the method further includes: acquiring single keywords with the same first characters; and storing the single keyword into the keyword set according to the magnitude sequence of the characteristic values of the single keyword.
Further, the determining that the notification information includes a target single keyword in a keyword set includes: obtaining the characteristic words of the notification information by segmenting the notification information; finding out the single keyword which is the same as the first character of the characteristic word in the keyword set; sequentially comparing the characteristic values of the characteristic words with the characteristic values of the single key words according to the storage sequence of the single key words; and when the target single keyword is determined in the single keywords, determining that the notification information comprises the target single keyword, wherein the target single keyword is the single keyword with the characteristic value equal to that of the characteristic word.
Further, after receiving the notification information and before determining that the notification information includes the target single keyword in the keyword set, the method further includes: and reading the keyword set corresponding to the service to which the notification information belongs.
In a second aspect, an embodiment of the present invention provides an apparatus for information identification, where the apparatus includes: a receiving unit configured to receive notification information; the matching unit is used for acquiring a target combined keyword comprising a target single keyword in the keyword set after the fact that the notification information comprises the target single keyword in the keyword set is determined; matching the notification information with the target combined keyword to obtain a matching result; and the determining unit is used for determining the notification information as preset type information when the matching result shows that the notification information comprises the target combined keyword.
Further, the matching unit is specifically configured to obtain the target combined keyword in the keyword set according to the position information corresponding to the target single keyword.
Further, the apparatus further comprises: the acquisition unit is used for acquiring single keywords with the same first characters; and the storage unit is used for storing the single keyword into the keyword set according to the magnitude sequence of the characteristic values of the single keyword.
Further, the matching unit is specifically configured to obtain a feature word of the notification information by segmenting the notification information; finding out the single keyword which is the same as the first character of the characteristic word in the keyword set; sequentially comparing the characteristic values of the characteristic words with the characteristic values of the single key words according to the storage sequence of the single key words; and when the target single keyword is determined in the single keywords, determining that the notification information comprises the target single keyword, wherein the target single keyword is the single keyword with the characteristic value equal to that of the characteristic word.
Further, the matching unit is further configured to read the keyword set corresponding to the service to which the notification information belongs before determining that the notification information includes the target single keyword.
The embodiment of the invention provides an information identification method and device, and the method comprises the steps that firstly, after the device receives notification information, a target single keyword included in the notification information is determined, then, a target combined keyword including the target single keyword is obtained in a keyword set, then, the device only matches the notification information with the target combined keyword to obtain a matching result, and then, when the matching result shows that the notification information includes the target combined keyword, the notification information is determined to be preset type information. Therefore, the device does not need to match the notification information with all combined keywords in a keyword set as in the prior art, but only needs to match with a small number of target combined keywords, so that the matching times of the notification information and the keywords are greatly reduced, the time complexity of information identification is further reduced, and the efficiency of the information identification method is improved.
Drawings
Fig. 1 is a schematic flow chart of a method for information identification according to a first embodiment of the present invention;
fig. 2 is a flowchart illustrating a method for identifying information according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an information recognition apparatus according to a third embodiment of the present invention;
fig. 4 is another schematic structural diagram of an information identification apparatus according to a third embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Example one
The embodiment provides an information identification method which is applied to an information identification device. The information identification device may be a terminal such as a smart phone, a tablet computer, a smart watch, or a network device such as a server and a network monitor, and the embodiment of the present invention is not particularly limited.
Referring to fig. 1, the information identification method may include:
s101: receiving notification information;
in practical application, the notification information may be a short message, an instant message, or the like received by the terminal, and may also be service information received by the network device and used for communication between the network devices.
S102: after determining that the notification information comprises a target single keyword in the keyword set, acquiring a target combined keyword comprising the target single keyword in the keyword set;
here, in order to identify the notification information, the information identifying apparatus needs to store a keyword set corresponding to the preset type information, the keyword set including a single keyword and a combined keyword composed of the single keyword according to a preset rule. The single keywords and the combined keywords can be extracted from a large amount of preset type information in a machine learning mode. It should be noted that, in practical applications, the preset rule may be set according to specific situations, and the embodiment of the present invention does not limit this. Preferably, the single keyword may be combined into a combined keyword according to an and/or an inequality logical relationship.
For example, when the information recognition apparatus receives the notification information, it is necessary to first determine whether a single keyword in the keyword set is included in the notification information. For example, whether the notification information includes one or more target single keywords may be determined by sequentially searching all the single keywords in the keyword set in the notification information; the method can also be used for segmenting the notification information to further obtain the characteristic words of the notification information, and then matching the characteristic words with all the single keywords in the keyword set to judge whether the notification information comprises the single keywords in the keyword set. When the notification information comprises a single keyword in the keyword set, determining that the single keyword is a target single keyword, and determining that the notification information comprises the target single keyword; when any one of the single keywords in the keyword set is not included in the notification information, it may be determined that the notification information is not the preset type information.
In order to identify the accuracy of the result and prevent erroneous judgment, the information identifying apparatus needs to determine whether or not the target combination keyword is included in the notification information after determining that the target single keyword is included in the notification information. The information recognition device may acquire the target combined keyword from the target single keyword. For example, the information recognition apparatus may find all the combined keywords including the target word keyword, that is, the target combined keyword, by searching the target word keyword among all the combined keywords of the keyword set.
S103, matching the notification information with the target combined keyword to obtain a matching result;
specifically, after S102, the information identifying apparatus needs to match the notification information with each target combined keyword determined by S102 to obtain a matching result, where the matching result may indicate that the target combined keyword is included in the notification information or that the target combined keyword is not included in the notification information.
In the following, the target combination keyword is composed of the target single keyword according to the and or logical relationship.
The information recognition apparatus may divide the target single keyword among the target combined keywords into at least one group, each group including at least one target single keyword. The logical relationship between the target single keywords in each group is "or", that is, as long as the notification information includes any one target single keyword in the group, it can be determined that the notification information includes the group. The logical relationship between the groups of the target combined keyword is "and", that is, when all groups including a certain target combined keyword in the notification information, the matching result can indicate that the target combined keyword is included in the notification information.
For example, in the keyword set, the combined keyword "health vehicle insurance" includes a single keyword "life", "financing", "health", "vehicle" and "insurance", wherein "life", "financing", "health" and "vehicle" belong to a first group and "insurance" belongs to a second group. First, since the composition relationship of "life", "financing", "health" and "vehicle" in the first group is logical or, it can be indicated that the first group is included in the notification information, for example, "life" is included in the notification information, as long as any one of the single keywords of "life", "financing", "health" and "vehicle" is included in the notification information. Meanwhile, the composition relationship between the first group and the second group is logical and, therefore, the notification information also needs to include the second group, and the second group only has a single keyword insurance, so that when the notification information also includes the insurance, the notification information can be determined to include the combined keyword 'life and financial and health vehicle insurance', and the combined keyword is the target combined keyword.
In this way, the notification information is only matched with the target combined keyword to obtain the matching result, and the notification information does not need to be matched with all combined keywords in the keyword set to obtain the matching result as in the prior art, so that the matching times are greatly reduced, the time complexity of the information identification method is reduced, and the efficiency of the information identification method is improved.
And S104, when the matching result shows that the notification information comprises the target combined keyword, determining the notification information as preset type information.
Specifically, since the single keyword and the combined keyword in the keyword set are extracted into a large amount of preset type information, when the notification information includes the target single keyword and the target combined keyword in the keyword set, the notification information can be more accurately determined to be the preset type information.
Optionally, when the target combined keyword including the target single keyword is obtained in the keyword set, the target combined keyword may be obtained in the keyword set according to the position information corresponding to the target single keyword.
For example, all the combined keywords in the keyword set are stored in an array, the length of the array represents the number of all the combined keywords in the keyword set, and each element in the array corresponds to one combined keyword, for example, 10000 combined keywords are stored in the keyword set, and then the length of the array is 10000. Each element in the array is divided into a plurality of units, wherein each unit corresponds to a single keyword included in the combined keyword, for example, the combined keyword 'life insurance' is composed of the single keywords 'life' and 'insurance', the 1 st unit stores the single keyword 'life', and the 2 nd unit stores the single keyword 'insurance'.
Accordingly, when storing a single keyword, location information of all combined keywords including the single keyword is stored at the same time. For example, the combined keywords "life insurance" and "insurance broker" including the single keyword "insurance" are stored in the 1 st and 10 th in the combined keyword array, respectively. In the array element corresponding to the combined keyword "life insurance", the single keyword "insurance" is stored in the 2 nd cell, and in the array element corresponding to the combined keyword "insurance broker", the single keyword "insurance" is stored in the 1 st cell, and therefore, the position information of the combined keyword including the single keyword "insurance" can be written as (1, 2), (10, 1) in a two-dimensional coordinate manner. Therefore, after the single keyword "insurance" is determined as the target single keyword, the combined keyword corresponding to the 1 st and 10 th elements in the combined keyword array in the keyword set can be obtained as the target combined keyword according to the position information (1, 2), (10, 1) of "insurance".
Optionally, before determining that the notification information includes the target single keyword in the keyword set, the single keyword with the same initial character may be obtained first; then, according to the size sequence, storing the single keyword into the keyword set.
For example, a hash value of a character code of a first character of a single keyword may be calculated first, and then the single keyword having the same hash value may be obtained as the single keyword having the same first character. Next, in the single keyword with the same first character, hash values of character codes of all characters of the single keyword are calculated to be used as characteristic values of the single keyword, and the single keyword is stored in the keyword set according to the dictionary ordering of the characteristic values of the single keyword. For example, the feature values of the single keywords with the same initial character are "abf", "abc", "add 2", "ada" and "add 1", respectively, and the order of the feature values is: "abc", "abf", "ada", "add 1", "add 2".
Optionally, when determining that the notification information includes the target single keyword in the keyword set, the method may first perform word segmentation on the notification information to obtain a feature word of the notification information; then, searching a single keyword which is the same as the first character of the characteristic word in the keyword set; then, sequentially comparing the characteristic values of the characteristic words with the characteristic values of the single keywords according to the storage sequence of the single keywords; next, when the target single keyword is determined from the single keywords, it is determined that the target single keyword is included in the notification information, wherein the target single keyword is a single keyword having a characteristic value equal to that of the feature word.
For example, the notification information may be segmented by a segmentation technique to obtain feature words of the notification information, where the segmentation technique includes a string matching segmentation method, a meaning segmentation method, a statistical segmentation method, and the like. Then, a hash value of the character encoding of the characteristic prefix character is calculated. Then, a single keyword having the same hash value as the feature word is obtained from the keyword set as a single keyword having the same first character as the first character of the feature word. For example, the storage order of a single keyword having the same first character as that of the feature word is: "abc", "abf", "ada", "add 1" and "add 2", where the single keyword is represented by a feature value of each single keyword, it can be seen that the feature values "abc", "abf", "ada", "add 1" and "add 2" are arranged in a lexicographic ordering. Then, a hash value of character codes of all characters of the feature word is calculated as a feature value of the feature word, for example, the feature value of the feature word is "abe". Next, the sizes of "abe" and "abc", "abf", "ada", "add 1" and "add 2" are compared in order according to the rules of dictionary ordering to find a single keyword having a feature value equal to "abe". First, comparing the sizes of "abe" and "abc", because "abe" is greater than "abc", it is necessary to continue comparing "abe" and "abf", and because "abe" is less than "abf", it is described that the feature words "ada", "add 1", and "add 2" stored after "abf" are all greater than "abe", so that it is possible to determine that there is no single keyword having a feature value equal to "abe", and then determine that the notification information does not include the target single keyword, and therefore it is not necessary to continue comparing the sizes of "abe" and "ada", "add 1", and "add 2".
Therefore, the characteristic words and the single keywords are sequentially matched in the single keywords with the same initial characters according to the storage sequence of the single keywords, and when the single keywords matched with the characteristic words do not exist, the remaining single keywords do not need to be traversed continuously, so that the matching times are reduced, the time complexity of the information identification method is reduced, and the efficiency of the information identification method is improved.
Optionally, after receiving the notification information and before determining that the notification information includes the target single keyword in the keyword set, the keyword set corresponding to the service to which the notification information belongs may be read first.
For example, in order to improve the efficiency of the information identification method, the notification information may be identified by using a distributed identification module according to the difference of the services to which the notification information belongs, and meanwhile, the keyword sets corresponding to different services may also be stored in a distributed manner. Taking the example that the notification information is a short message as an example, the short messages from the same mobile phone number may correspond to a service, or the short messages from the same mobile phone number of the home location may correspond to a service.
Preferably, the notification information may include an identification module identifier and a keyword set identifier, and the notification information having the same service has the same identification module identifier and the same keyword set identifier, where the identification module identifier is used to indicate a label of a distributed identification module that identifies the notification information, and the keyword set identifier is used to indicate a label of a keyword set that needs to be read by the distributed identification module.
For example, if the identification module in the short message from the mobile phone number belonging to beijing is 1 or 2, it indicates that the short message needs to be identified by using the distributed identification module with the label 1 or 2, therefore, the distributed identification module with the label 1 receives and identifies the short message, and if the distributed identification module with the label 1 is in an abnormal state, for example, in a power-off or dead-halt state, the distributed identification module with the label 2 receives and identifies the short message, thereby providing a guarantee for the instant identification of the short message. The embodiment takes the example that the distributed identification module with the reference number 1 receives and identifies the short message as an example for explanation. Assuming that the keyword sets in the short message are labeled as 1 and 3, after receiving the short message, the distributed identification module labeled as 1 first reads the keyword sets labeled as 1 and labeled as 3, and then determines whether the short message includes a target single keyword and a target combined keyword in the keyword sets labeled as 1 and labeled as 3. And when the short message comprises the target single keyword and the target combined keyword, determining the short message as preset type information.
Therefore, the notification information is identified by adopting the distributed identification modules according to different services to which the notification information belongs, so that different notification information can be identified by different distributed identification modules, hardware resources are effectively utilized, and the efficiency of the information identification method is improved. Meanwhile, the keyword sets corresponding to different services are stored in a distributed manner, so that the notification information of the same service can be matched in the keyword set corresponding to the service, the notification information is prevented from being matched with the keywords corresponding to other services, the matching efficiency is improved, the interference of the keywords corresponding to other services on the identification of the notification information of the service can be prevented, and the identification accuracy is improved.
The embodiment of the invention provides a method and a device for identifying information, which comprises the steps of firstly receiving notification information; then after determining that the notification information comprises a target single keyword in the keyword set, acquiring a target combined keyword comprising the target single keyword in the keyword set; next, only the notification information is matched with the target combined keyword, and a matching result is obtained; and then, when the matching result shows that the notification information contains the target combined keyword, determining the notification information as preset type information. The notification information and all combined keywords in the keyword set do not need to be matched to obtain a matching result as in the prior art, so that the matching times are greatly reduced, the time complexity of information identification is reduced, and the efficiency of information identification is improved.
Example two
An embodiment of the present invention provides an information identification method, which is applied to a device with information processing capability, such as a terminal, a server, a network monitor, and the like, and as shown in fig. 2, the information identification method includes:
s201, acquiring single keywords with the same first characters;
for example, the hash value of GB2312 (chinese character encoding character set for information exchange) encoding of the first character of the single keyword may be calculated first, and then the single keywords having the same hash value are obtained, that is, the single keywords having the same first character.
S202, obtaining a characteristic value of a single keyword;
for example, a hash value of GB2312 encoding of all characters of a single keyword may be calculated as a feature value of the single keyword; the hash value of the GB2312 code of the preset character of the single keyword may also be calculated as the characteristic value of the single keyword, and in practical applications, the preset character may be set according to specific situations, which is not limited in the embodiments of the present invention.
S203, storing the single keywords and the position information corresponding to the single keywords into a keyword set according to the magnitude sequence of the characteristic values of the single keywords;
here, the position information corresponding to the single keyword indicates a position of a combined keyword including the single keyword in the keyword set.
For example, the single keyword and the position information corresponding to the single keyword may be stored in the keyword set according to a dictionary ordering of feature values of the single keyword.
S204, receiving notification information;
for example, the notification information may be information such as a short message and a WeChat received by the terminal, or information received by the network device for performing communication between the devices.
S205, reading a keyword set corresponding to the service to which the notification information belongs;
for example, a keyword set corresponding to a service to which notification information belongs may be read according to a keyword set identifier in the notification information.
S206, obtaining characteristic words of the notification information by segmenting the notification information;
for example, the notification information may be segmented by a segmentation technique to obtain feature words of the notification information, where the segmentation technique includes a string matching segmentation method, a word sense segmentation method, a statistical segmentation method, and the like.
S207, finding out a single keyword which is the same as the first character of the characteristic word in the keyword set;
specifically, whether a single keyword with the same first character as that of the feature word exists needs to be searched in the keyword set; when a single keyword with the first character identical to that of the characteristic word is found in the keyword set, the fact that the notification information possibly comprises the single keyword in the keyword set is shown, and further judgment needs to be carried out on the found single keyword; when a single keyword having the same first character as that of the feature word is not found in the keyword set, it indicates that the single keyword in the keyword set is not included in the notification information.
S208, comparing the feature value of the feature word with the feature value of the ith single keyword; when the feature value of the feature word is greater than the feature value of the ith single keyword, executing S209; when the feature value of the feature word is equal to the feature value of the ith single keyword, executing S210; when the feature value of the feature word is smaller than that of the ith single keyword, executing S204;
here, i is an integer greater than or equal to 1.
For example, the single keywords are stored according to dictionary ordering, that is, the single keywords are stored in the order from small to large, so that when the feature value of a feature word is greater than the feature value of the ith single keyword, the single keyword indicating that the feature value may be equal to the feature value of the feature word is located behind the ith single keyword, and the feature value of the feature word and the feature value of the next single keyword should be continuously compared; when the characteristic value of the characteristic word is equal to the characteristic value of the ith single keyword, the characteristic word is the same as the ith single keyword, and the ith single keyword can be determined to be the target single keyword, so that the notification information is determined to comprise the target single keyword; when the feature value of the feature word is smaller than the feature value of the ith single keyword, the feature values of the single keywords arranged behind the ith single keyword are all larger than the feature value of the ith single keyword, so that the feature values of the single keywords arranged behind the ith single keyword are all larger than the feature value of the feature word, therefore, the single keywords which are the same as the feature word do not exist in the single keywords arranged behind the ith single keyword, the notification information can be determined not to include the target single keyword, the feature value of the feature word and the feature value of the next single keyword should be stopped from being continuously compared, the comparison times can be reduced, and the efficiency of the information identification method is improved.
S209, assigning the value of i +1 to i, and returning to S208;
here, the i +1 th single keyword is the next single keyword located after the i-th single keyword in the storage order of the single keywords.
S210, determining that the ith single keyword is the target single keyword, and determining that the notification information comprises the target single keyword;
specifically, when the feature value of the feature word is equal to the feature value of the ith single keyword, it is described that the feature word is the same as the ith single keyword, and the ith single keyword can be determined to be the target single keyword, so that it is determined that the notification information includes the target single keyword.
S211, acquiring a target combined keyword in a keyword set according to the position information corresponding to the target single keyword;
for example, the combined keyword may be stored according to a label in the keyword set, and correspondingly, the position information corresponding to the single keyword is all labels in the keyword set of the combined keyword including the single keyword. For example, if the position information corresponding to the target single keyword is 1 and 10, the combined keyword numbered 1 and the combined keyword numbered 10 may be obtained as the target combined keyword in the keyword set.
S212, matching the notification information with the target combined keyword to obtain a matching result;
specifically, the matching result may indicate that the notification information includes the target combined keyword, or may indicate that the notification information does not include the target combined keyword.
And S213, when the matching result shows that the notification information comprises the target combined keyword, determining the notification information as preset type information.
Specifically, when the notification information includes both the target single keyword and the target combined keyword, the notification information can be determined as the preset type information more accurately.
It should be noted that, the order of the steps of the information identification method provided in the embodiment of the present invention may be appropriately adjusted, and the steps may also be increased or decreased according to the circumstances, and any method that can be easily conceived by a person skilled in the art within the technical scope disclosed in the present invention should be included in the protection scope of the present invention, and therefore, the details are not described again.
The embodiment of the invention provides an information identification method, which comprises the steps of firstly receiving notification information; then after determining that the notification information comprises a target single keyword in the keyword set, acquiring a target combined keyword comprising the target single keyword in the keyword set; next, only the notification information is matched with the target combined keyword, and a matching result is obtained; and then, when the matching result shows that the notification information contains the target combined keyword, determining the notification information as preset type information. The notification information and all combined keywords in the keyword set do not need to be matched to obtain a matching result as in the prior art, so that the matching times are greatly reduced, the time complexity of information identification is reduced, and the efficiency of information identification is improved.
EXAMPLE III
An embodiment of the present invention provides an information identification apparatus, and as shown in fig. 3, the apparatus 30 includes: a receiving unit 301 configured to receive notification information; a matching unit 302 configured to, after determining that the notification information includes the target single keyword in the keyword set, acquire a target combined keyword including the target single keyword in the keyword set; matching the notification information with the target combined keyword to obtain a matching result; a determining unit 303, configured to determine the notification information as preset type information when the matching result indicates that the notification information includes the target combination keyword.
Optionally, the matching unit 302 is specifically configured to obtain the target combined keyword in the keyword set according to the position information corresponding to the target single keyword.
Optionally, referring to fig. 4, the apparatus 30 further includes: an obtaining unit 304, configured to obtain a single keyword with the same first character; the storage unit 305 is configured to store the single keyword into the keyword set according to the magnitude order of the feature values of the single keyword.
Optionally, the matching unit 302 is specifically configured to obtain a feature word of the notification information by performing word segmentation on the notification information; finding out a single keyword which is the same as the first character of the characteristic word in the keyword set; sequentially comparing the characteristic values of the characteristic words with the characteristic values of the single key words according to the storage sequence of the single key words; and when the target single keyword is determined from the single keywords, determining that the notification information comprises the target single keyword, wherein the target single keyword is the single keyword with the characteristic value equal to that of the characteristic words.
Optionally, the matching unit 302 is further configured to read a keyword set corresponding to a service to which the notification information belongs before determining that the notification information includes the target keyword.
It should be noted that, for convenience and brevity of description, it may be clearly understood by those skilled in the art that the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Secondly, in practical applications, the matching Unit 302, the determining Unit 303, the obtaining Unit 304, and the storing Unit 305 may be implemented by a Central Processing Unit (CPU), a MicroProcessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like located in the apparatus 30. The receiving unit 301 may be implemented by an antenna located in the apparatus 30, a driving circuit of the antenna, and in turn by various opto-electronic receiving devices or ports.
An embodiment of the present invention provides an information identification apparatus, including: a receiving unit configured to receive notification information; the matching unit is used for acquiring a target combined keyword comprising the target single keyword in the keyword set after the notification information is determined to comprise the target single keyword in the keyword set; matching the notification information with the target combined keyword to obtain a matching result; and the determining unit is used for determining the notification information as preset type information when the matching result shows that the notification information comprises the target combined keyword. Compared with the prior art, the notification information is only matched with the target combined keyword, and is not required to be matched with all combined keywords in the keyword set, so that the matching times are greatly reduced, the time complexity of information identification is reduced, and the efficiency of information identification is improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (8)

1. A method of information identification, the method comprising:
receiving notification information;
after determining that the notification information comprises a target single keyword in a keyword set, acquiring a target combined keyword comprising the target single keyword in the keyword set; acquiring the target combined keyword in the keyword set according to the position information corresponding to the target single keyword; when the single keyword is stored, storing the position information of all the combined keywords comprising the single keyword in the array;
matching the notification information with the target combined keyword to obtain a matching result;
and when the matching result shows that the notification information comprises the target combined keyword, determining the notification information as preset type information.
2. The method of claim 1, wherein prior to the determining that the notification information includes a target single keyword from a set of keywords, the method further comprises:
acquiring single keywords with the same first characters;
and storing the single keyword into the keyword set according to the magnitude sequence of the characteristic values of the single keyword.
3. The method of claim 2, wherein the determining that the notification information includes a target single keyword in a keyword set comprises:
obtaining the characteristic words of the notification information by segmenting the notification information;
finding out the single keyword which is the same as the first character of the characteristic word in the keyword set;
sequentially comparing the characteristic values of the characteristic words with the characteristic values of the single key words according to the storage sequence of the single key words;
and when the target single keyword is determined in the single keywords, determining that the notification information comprises the target single keyword, wherein the target single keyword is the single keyword with the characteristic value equal to that of the characteristic word.
4. The method of claim 1, wherein after the receiving notification information and before determining that a target single keyword from a set of keywords is included in the notification information, the method further comprises:
and reading the keyword set corresponding to the service to which the notification information belongs.
5. An apparatus for information recognition, the apparatus comprising:
a receiving unit configured to receive notification information;
the matching unit is used for acquiring a target combined keyword comprising a target single keyword in the keyword set after the fact that the notification information comprises the target single keyword in the keyword set is determined; matching the notification information with the target combined keyword to obtain a matching result;
the determining unit is used for determining the notification information as preset type information when the matching result shows that the notification information comprises the target combined keyword;
the matching unit is specifically configured to obtain the target combined keyword in the keyword set according to the position information corresponding to the target single keyword; when the single keyword is stored, the position information of all the combined keywords including the single keyword in the array is stored at the same time.
6. The apparatus of claim 5, further comprising:
the acquisition unit is used for acquiring single keywords with the same first characters;
and the storage unit is used for storing the single keyword into the keyword set according to the magnitude sequence of the characteristic values of the single keyword.
7. The apparatus according to claim 6, wherein the matching unit is specifically configured to obtain a feature word of the notification information by performing word segmentation on the notification information; finding out the single keyword which is the same as the first character of the characteristic word in the keyword set; sequentially comparing the characteristic values of the characteristic words with the characteristic values of the single key words according to the storage sequence of the single key words; and when the target single keyword is determined in the single keywords, determining that the notification information comprises the target single keyword, wherein the target single keyword is the single keyword with the characteristic value equal to that of the characteristic word.
8. The apparatus according to claim 5, wherein the matching unit is further configured to read the keyword set corresponding to the service to which the notification information belongs before determining that the target single keyword is included in the notification information.
CN201610643277.5A 2016-08-08 2016-08-08 Information identification method and device Active CN106649427B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610643277.5A CN106649427B (en) 2016-08-08 2016-08-08 Information identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610643277.5A CN106649427B (en) 2016-08-08 2016-08-08 Information identification method and device

Publications (2)

Publication Number Publication Date
CN106649427A CN106649427A (en) 2017-05-10
CN106649427B true CN106649427B (en) 2020-07-03

Family

ID=58852583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610643277.5A Active CN106649427B (en) 2016-08-08 2016-08-08 Information identification method and device

Country Status (1)

Country Link
CN (1) CN106649427B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063446A (en) * 2009-11-13 2011-05-18 中国移动通信集团四川有限公司 Method for creating inverted index and inverted indexing device
CN102238097A (en) * 2010-05-07 2011-11-09 阿里巴巴集团控股有限公司 Instant messaging (IM)-based information reminding method and device
CN103514238A (en) * 2012-06-30 2014-01-15 重庆新媒农信科技有限公司 Sensitive word recognition processing method based on classification searching
CN105426357A (en) * 2015-11-06 2016-03-23 武汉卡比特信息有限公司 Fast voice selection method
CN105550298A (en) * 2015-12-11 2016-05-04 北京搜狗科技发展有限公司 Keyword fuzzy matching method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063446A (en) * 2009-11-13 2011-05-18 中国移动通信集团四川有限公司 Method for creating inverted index and inverted indexing device
CN102238097A (en) * 2010-05-07 2011-11-09 阿里巴巴集团控股有限公司 Instant messaging (IM)-based information reminding method and device
CN103514238A (en) * 2012-06-30 2014-01-15 重庆新媒农信科技有限公司 Sensitive word recognition processing method based on classification searching
CN105426357A (en) * 2015-11-06 2016-03-23 武汉卡比特信息有限公司 Fast voice selection method
CN105550298A (en) * 2015-12-11 2016-05-04 北京搜狗科技发展有限公司 Keyword fuzzy matching method and device

Also Published As

Publication number Publication date
CN106649427A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN108255958B (en) Data query method, device and storage medium
CN108985066B (en) Intelligent contract security vulnerability detection method, device, terminal and storage medium
CN106997431B (en) Data processing method and device
CN113328994B (en) Malicious domain name processing method, device, equipment and machine readable storage medium
CN111159329A (en) Sensitive word detection method and device, terminal equipment and computer-readable storage medium
CN113344682A (en) Order processing method and device, electronic equipment and computer readable storage medium
CN111586695A (en) Short message identification method and related equipment
CN111368697A (en) Information identification method and device
CN111767419B (en) Picture searching method, device, equipment and computer readable storage medium
CN114817651A (en) Data storage method, data query method, device and equipment
CN111177362A (en) Information processing method, device, server and medium
CN117313159A (en) Data processing method, device, equipment and storage medium
CN106649427B (en) Information identification method and device
EP3531335A1 (en) Barcode identification method and apparatus
CN115169489A (en) Data retrieval method, device, equipment and storage medium
CN111753548B (en) Information acquisition method and device, computer storage medium and electronic equipment
CN114359811A (en) Data authentication method and device, electronic equipment and storage medium
CN114218554A (en) Equipment fingerprint generation method, device, equipment and readable storage medium
CN113656466A (en) Policy data query method, device, equipment and storage medium
CN113656731A (en) Advertisement page processing method and device, electronic equipment and storage medium
CN111382233A (en) Similar text detection method and device, electronic equipment and storage medium
CN110046180A (en) It is a kind of for positioning the method, apparatus and electronic equipment of similar case
Yang et al. Time Slot Detection‐Based M‐ary Tree Anticollision Identification Protocol for RFID Tags in the Internet of Things
CN116028481B (en) Data quality detection method, device, equipment and storage medium
CN117081727B (en) Weak password detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant