CN111860549B

CN111860549B - Information identification device, information identification method, computer device, and storage medium

Info

Publication number: CN111860549B
Application number: CN201910277264.4A
Authority: CN
Inventors: 兰红云
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2019-04-08
Filing date: 2019-04-08
Publication date: 2024-02-20
Anticipated expiration: 2039-04-08
Also published as: CN111860549A

Abstract

The application provides an information identification device, an information identification method, a computer device and a storage medium, wherein the information identification device comprises: the receiving module is used for determining a first characteristic information set contained in the information to be identified after receiving the information to be identified; the extraction module is used for extracting at least one piece of candidate information matched with the first characteristic information in a pre-stored information base to form a candidate information set; the first determining module is used for determining the similarity between the information to be identified and each candidate information in the candidate information sets according to the number of the characteristic information of the first characteristic information set, the number of the characteristic information of the second characteristic information set and the positions of each characteristic information in the characteristic information set with less characteristic information in the first characteristic information set and the second characteristic information set respectively; the second determining module determines the attribute label corresponding to the information to be identified according to the determined similarity and the attribute label of the candidate information in the pre-stored information base, and the identification efficiency of the attribute label of the target object is improved.

Description

Information identification device, information identification method, computer device, and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to an information identification apparatus, an information identification method, a computer device, and a storage medium.

Background

At present, various scenes need to identify attribute tags of target objects, for example, online shops can identify the attribute tags of users according to text feature information input by the users, so that the users can be better served by determining the types of the users, and face images of the users can be identified in the field of safety monitoring, so that user identity tags are determined.

When the target object is identified, the information related to the target object, such as text or image, is compared with all information in a pre-established information base, so that the similarity between the information and the information in the text base is determined, the attribute label of the target object is further determined, and when the information amount in the pre-established information base is large, the comparison process is complicated, and the information identification efficiency is low.

Disclosure of Invention

In view of the foregoing, an object of the present application is to provide an information identifying apparatus, method, computer device, and storage medium, so as to improve the identification efficiency of attribute tags of target objects.

In a first aspect, an embodiment of the present application provides an information identifying apparatus, including:

the device comprises a receiving module, an extracting module and a first determining module, wherein the receiving module is used for determining a first characteristic information set contained in the information to be identified after receiving the information to be identified, the first characteristic information set contains at least one first characteristic information, and the first characteristic information set is transmitted to the extracting module and the first determining module;

the extraction module is used for extracting at least one piece of candidate information matched with the first characteristic information in a pre-stored information base to form a candidate information set, wherein each piece of candidate information comprises a second characteristic information set formed by at least one piece of second characteristic information, and the candidate information set is transmitted to the first determination module;

the first determining module is used for determining the similarity between the information to be identified and each candidate information in the candidate information sets according to the first characteristic information number of the first characteristic information set, the second characteristic information number of the second characteristic information set, the position information of each characteristic information in the characteristic information sets with less characteristic information in the first characteristic information set and the second characteristic information set, and transmitting the similarity to the second determining module;

And the second determining module is used for determining the attribute label corresponding to the information to be identified according to the determined similarity and the attribute label of the candidate information in the pre-stored information base.

In some embodiments, the information to be recognized includes text to be recognized, the first feature information includes feature words, and the receiving module is specifically configured to:

after receiving a text to be recognized input by a target object, performing word segmentation on the text to be recognized to obtain a plurality of word units;

and filtering a plurality of word units based on preset common words to obtain the feature words, and arranging the feature words according to the position relation of each feature word in the text to be recognized to form a first feature information set of the text to be recognized.

In some embodiments, the information to be identified comprises an image to be identified, and the first characteristic information comprises a gray value; the receiving module is specifically configured to:

after receiving the image to be identified, if the image to be identified is a color image, converting the color image into a gray image;

dividing the gray level image according to set rows and columns to obtain a plurality of gray level sub-images, and determining the gray level value of each gray level sub-image;

And according to the position information of each gray sub-image in the gray image, arranging gray values of each gray sub-image to form the first characteristic information set.

In some embodiments, the extraction module is specifically configured to:

traversing from any first characteristic information in the first characteristic information set, and searching whether the characteristic information matched with the first characteristic information currently traversed exists in the pre-stored information base;

if the feature information exists, extracting candidate information of the existing feature information, and forming a candidate information set according to each extracted candidate information;

and if the traversal is finished, the feature information matched with the first feature information is not found in the pre-stored information base, and prompt information for indicating that the candidate information is not found is output.

In some embodiments, the first determining module is specifically configured to:

selecting, for each candidate information, based on the first number of feature information and the second number of feature information of the candidate information, less feature information corresponding to the information to be identified and the candidate information as first comparison information, and selecting more feature information corresponding to the information to be identified and the candidate information as second comparison information;

Sequentially traversing from any one of the third characteristic information sets corresponding to the first comparison information, and determining a characteristic information string set consisting of at least one characteristic information string if characteristic information matched with the characteristic information string consisting of the continuously traversed third characteristic information exists from the fourth characteristic information set corresponding to the second comparison information;

for each feature information string, determining the similarity between the feature information string and a fourth feature information set according to the position of the first third feature information of the feature information string in the third feature information set, the position of the feature information matched with the first third feature information in the fourth feature information set, the position of the last third feature information in the third feature information set, the number of the third feature information set and the number of the fourth feature information set;

and selecting the maximum similarity from the similarity of each characteristic information string in the characteristic information string sets and the fourth characteristic information set as the similarity of the information to be identified and the candidate information.

Sequentially traversing from any one of the third characteristic information in the third characteristic information set corresponding to the first comparison information;

judging whether feature information matched with the traversed third feature information continuously exists in the fourth feature information set;

if yes, extracting continuous third characteristic information from the third characteristic information set as the characteristic information string, wherein the characteristic information matched with the continuous third characteristic information is contained in the fourth characteristic information set.

In some embodiments, the second determining module is specifically configured to:

judging whether first candidate information with the similarity with the information to be identified being greater than a preset threshold exists in the candidate information set;

if the first candidate information exists, sorting the first candidate information based on the order of the similarity between the first candidate information and the information to be identified; determining an attribute tag corresponding to the information to be identified according to a preset attribute tag configuration strategy, the ordered first candidate information and attribute tags of the first candidate information;

and if the first candidate information does not exist, outputting prompt information for indicating that the first candidate information is not found.

In a second aspect, an embodiment of the present application provides an information identifying method, including:

After receiving information to be identified, determining a first characteristic information set contained in the information to be identified, wherein the first characteristic information set contains at least one first characteristic information;

extracting at least one piece of candidate information matched with the first characteristic information from a pre-stored information base to form a candidate information set; wherein each piece of candidate information comprises a second characteristic information set formed by at least one piece of second characteristic information;

determining the similarity between the information to be identified and each candidate information in the candidate information sets according to the first characteristic information number of the first characteristic information set, the second characteristic information number of the second characteristic information set and the position information of each characteristic information in the characteristic information sets with less characteristic information numbers in the first characteristic information set and the second characteristic information set;

and determining the attribute label corresponding to the information to be identified according to the determined similarity and the attribute label of the candidate information in the pre-stored information base.

In a third aspect, embodiments of the present application provide a computer device, including: a processor, a storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over a bus when the computer device is running, the processor executing the machine-readable instructions to perform the steps of the information identification method according to the second aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the information identification method according to the second aspect.

In the embodiment of the application, firstly, the candidate information set is selected from the pre-stored information base according to the first characteristic information, then the to-be-identified information is only compared with the candidate information in the candidate information set, so that the comparison time is greatly shortened, secondly, when the similarity is determined, the similarity of the to-be-identified information and the candidate information is determined only according to the number of the first characteristic information of the to-be-identified information, the number of the second characteristic information of the candidate information and the positions of each characteristic information in the first characteristic information set and the second characteristic information set in the characteristic information set, which are less in number, in the similarity determination process, the similarity of the to-be-identified information and the candidate information is determined quickly by only locating the positions of less characteristic information in the characteristic information set and combining the number of the characteristic information in the two characteristic information sets compared with the similarity, so that the attribute labels corresponding to the to-be-identified information are determined quickly, and the information identification efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a flowchart of an information identification method provided in an embodiment of the present application;

FIG. 2 is a flowchart of a method for determining a first feature information set and a first feature information number of the first feature information set according to an embodiment of the present application;

FIG. 3 is a flowchart of another method for determining a first set of feature information and a first number of feature information of the first set of feature information according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for obtaining a candidate information set according to an embodiment of the present application;

FIG. 5 is a flowchart of a method for determining similarity between information to be identified and each candidate information in a candidate information set according to an embodiment of the present application;

FIG. 6 is a flowchart of a method for obtaining a feature information string set according to an embodiment of the present application;

Fig. 7 is a flowchart of a method for determining an attribute tag corresponding to information to be identified according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an information identifying apparatus according to an embodiment of the present application;

fig. 9 shows a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the accompanying drawings in the present application are only for the purpose of illustration and description, and are not intended to limit the protection scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this application, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to the flow diagrams and one or more operations may be removed from the flow diagrams as directed by those skilled in the art.

In addition, the described embodiments are only some, but not all, of the embodiments of the present application. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

In order to enable one skilled in the art to use the present disclosure, the following embodiments are presented in connection with a specific application scenario "attribute tag identification of a target object". It will be apparent to those having ordinary skill in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present application. While the present application is described primarily in terms of attribute tag identification of a target object, it should be understood that this is but one exemplary embodiment.

It should be noted that the term "comprising" will be used in the embodiments of the present application to indicate the presence of the features stated hereinafter, but not to exclude the addition of other features.

Aiming at the problem of low recognition efficiency when the attribute tags of the target object are recognized based on the similarity in the prior art, the embodiment of the application provides an information recognition method so as to improve the recognition efficiency.

The embodiment of the application provides an information identification method, as shown in fig. 1, including the following processes S101 to S104:

s101, after receiving information to be identified, determining a first characteristic information set contained in the information to be identified, wherein the first characteristic information set contains at least one first characteristic information.

The information to be identified can represent different contents in different application scenes, for example, the information can be text to be identified, namely, the text to be identified can be identified by receiving the text to be identified input by a user, so that the attribute label of the user associated with the text to be identified is determined, and the target object is the user; alternatively, the information to be identified herein may be an image, for example, the image herein is an image of the target object, and the attribute tag of the target object may be determined according to identifying the image, where the target object may be a user or an article.

The attribute tag may be a type of the target object, for example, if the target object is a user, the type of the user may be identified, for example, the user may be a price sensitivity user, a high demand user, a shopping fan user, or a tourist fan user, where price sensitivity, a high demand of products, a shopping fan, and a tourist fan are the attribute tags to be identified.

In one embodiment, when the information to be identified is a text to be identified, the first feature information includes feature words, and in step S101, after receiving the information to be identified associated with the target object, the first feature information set included in the information to be identified is determined, as shown in fig. 2, including the following specific processes S201 to S203:

s201, after receiving a text to be recognized input by a target object, performing word segmentation on the text to be recognized to obtain a plurality of word units;

s202, filtering a plurality of word units based on preset common words to obtain feature words, and arranging the feature words according to the position relation of each feature word in the text to be recognized to form a first feature information set of the text to be recognized.

The target object can be a user to be identified by attribute tags, for example, a text to be identified input by the user on a website is "purchase commodity C product in Shanghai market A or B market", word segmentation processing can be performed on the text to be identified through a word segmentation device to obtain word units of "Shanghai", "A market", "or", "B market", "purchase", "commodity C" and "product", and the commodity C can be a commodity with a certain characteristic, such as a commodity with high price.

These word units are then filtered based on a common word dictionary, for example, after removing common words that have no practical meaning, such as "in", "or" and "product", five feature words are obtained, respectively: the method comprises the steps of "Shanghai", "A market", "B market", "purchase" and "commodity C", and then arranging five feature words according to the position relation of the five feature words in the "A market or B market purchase commodity C product" of Shanghai to be identified, so as to form a first feature information set of the text to be identified, namely "Shanghai, A market, B market, purchase and commodity C".

In addition, after the first feature information set is obtained, the number of feature words in the first feature information set, i.e. the number of feature words in Shanghai, market a, market B, purchase and commodity C, may be extracted, where the number is 5, that is, the number of first feature information in the first feature information set is determined to be 5.

In another embodiment, when the information to be identified includes an image to be identified, the first feature information includes a gray value, and after receiving the information to be identified, the first feature information set included in the information to be identified is determined in step S101, as shown in fig. 3, including the following processes S301 to S303:

S301, after receiving an image to be identified, if the image to be identified is a color image, converting the color image into a gray image;

s302, dividing a gray level image according to set rows and columns to obtain a plurality of gray level sub-images, and determining the gray level value of each gray level sub-image;

s303, according to the position information of each gray sub-image in the gray image, gray values of the gray sub-images are arranged to form a first characteristic information set.

The image to be identified herein may be a person image or an object image, and if the received image to be identified is a color image, the color image is first converted into a gray image, and if the received image to be identified is a gray image, step S301 is not required to be executed.

Here, dividing the gray scale image by the set row and column means dividing the gray scale image into gray scale sub-images having the same size, and the size of each gray scale sub-image is the same as the size of each gray scale sub-image corresponding to each gray scale value in the images stored in the pre-stored information base, for example, if each gray scale value of a certain image stored in the pre-stored information base is the gray scale value of the gray scale sub-image having the size of 16×16, the set row and column herein means dividing the gray scale image into a plurality of gray scale sub-images having the size of 16×16.

For example, if the size of a gray level image is 256×256, if the size of a gray level sub-image corresponding to each gray level value in each image stored in the pre-stored information base is 16×16, the received gray level image may be divided into 16 rows and 16 columns, that is, the size of each gray level sub-image is 16×16, that is, 256 gray level sub-images are obtained, each gray level sub-image includes 256 pixel points, and for each gray level sub-image, the gray level value of each gray level sub-image is determined according to calculating the average gray level value of the pixel points in the gray level sub-image, for example, the average gray level value of all the pixel points in one gray level sub-image is 225, and the gray level value of the gray level sub-image is 225.

Then, according to the position information of each gray sub-image in the gray image, the gray values of each gray sub-image are arranged, for example, after the received gray image is divided into M rows and N columns, the gray image becomes a gray sub-image of M rows and N columns, from the second row, the first gray connection in each gray sub-image of each row is connected with the last gray sub-image in the gray sub-image of the previous row, and according to the gray sub-images arranged into M x N columns of rows, the position information of each gray sub-image in the gray image can be coordinate information in the gray image, for example, the position of a certain gray sub-image is the second row and the second column, that is, the position information of the next gray sub-image of the gray sub-image can be expressed as (2, 2).

In the above manner, if the gray-scale image has 16×16 gray-scale values in total, these gray-scale values are arranged, for example, after the gray-scale image is divided into 16 rows and 16 columns, the first gray-scale connection in each row of gray-scale sub-images is sequentially connected with the last gray-scale sub-image in the previous row of gray-scale sub-images from the 2 nd row, so as to obtain one gray-scale sub-image of 1 row and 256 columns, and correspondingly, 16×16 gray-scale values are also arranged in the same order as the arrangement order of the gray-scale sub-images, so as to obtain the first feature information set.

In addition, after the first feature information set is obtained, the number of gray values of the first feature information set may be extracted, for example, the first feature information set includes 256 gray values, and the number of the first feature information set is 256.

S102, extracting at least one piece of candidate information matched with the first characteristic information from a pre-stored information base to form a candidate information set; wherein each piece of candidate information comprises a second characteristic information set formed by at least one piece of second characteristic information.

The pre-stored information base is a pre-established information base, a plurality of pieces of information are stored in the information base, each piece of information comprises at least one piece of characteristic information, the characteristic information can be a characteristic word or a gray value, each piece of information corresponds to an attribute tag, and the attribute tag corresponding to each piece of information is determined and stored in advance according to the characteristic information in the piece of information.

The candidate information matched with the first feature information may be candidate information including at least one feature information matched with the first feature information, where the feature information matched with the first feature information may be identical to the first feature information, or may be identical to the type of the first feature information, for example, if one first feature information in the first feature information set is "C commodity", the candidate information matched with the first feature information means that the corresponding second feature information set contains "C commodity" or contains commodity belonging to the same type as "C commodity", for example, if the C commodity is a face skin cream with a high price, the commodity belonging to the same type as "C commodity" may also be another face skin cream belonging to the same price range as the price of the C commodity, and it is not necessarily required to be the C commodity.

Specifically, when the information to be identified is a text to be identified, after determining a first feature information set of the information to be identified, the first feature information set may be vectorized, and in this case, a second feature information set of candidate information in the pre-stored information base is also stored in the pre-stored information base in a vector form, so that, because feature values corresponding to feature information belonging to the same class may be the same, when extracting the candidate information set, the number of candidate information sets is increased.

Specifically, in step S102, at least one piece of candidate information matched with the first feature information is extracted from the pre-stored information base to form a candidate information set, as shown in fig. 4, including the following processes S401 to S403:

s401, traversing from any first characteristic information in the first characteristic information set, and searching whether characteristic information matched with the first characteristic information currently traversed exists in a pre-stored information base;

s402, extracting candidate information of the existing characteristic information if the characteristic information exists;

s403, forming a candidate information set according to each extracted candidate information.

The first characteristic information in the first characteristic information set may be traversed in turn, or the last characteristic information in the first characteristic information set may be traversed in turn, or any first characteristic information in the middle of the first characteristic information set may be traversed in a clockwise direction or in a counterclockwise direction, for example, for the first characteristic information set "Shanghai, A market, B market, purchase, and commodity C", the first characteristic information set "Shanghai" may be traversed in turn, or the first characteristic information set "Shanghai" may be traversed in turn, for example, from commodity C, whether characteristic information including "commodity C" matches is found in a pre-stored information base, if the characteristic information including "commodity C" matches is found in a pre-stored information base, the candidate information including "commodity C" matches is extracted, then, in turn, for example, whether the candidate information including "shopping C" matches "is found in a pre-stored information base, the candidate information including" shopping C "matches" candidate C "is found in a pre-stored information base, the candidate information including" candidate C "matches" 20 "matches" 10 "is found in a candidate set, and" candidate C "matches" 20 "matches" is found in a candidate set.

In addition, if the characteristic information matched with the first characteristic information is not found in the pre-stored information base after the traversing is finished, the prompting information for indicating that the candidate information is not found is output.

For example, if the feature information of "Shanghai", "A market", "B market", "purchase" or "commodity C" matching is not found in the pre-stored information base, it is indicated that the pre-stored information base does not store the information related to the information to be identified, and at this time, prompt information for indicating that no candidate information is found may be output, for example, voice prompt may be performed or display may be performed through a display screen, and then subsequent processing is performed by the staff.

S103, determining the similarity between the information to be identified and each candidate information in the candidate information sets according to the first characteristic information number of the first characteristic information set, the second characteristic information number of the second characteristic information set and the position information of each characteristic information in the characteristic information sets with less characteristic information numbers in the first characteristic information set and the second characteristic information set.

When the information to be identified is a text to be identified, for any candidate information in the candidate information set, when the similarity between the information to be identified and the any candidate information is determined, firstly comparing the number of feature words in the first feature information set corresponding to the information to be identified with the number of feature words in the second feature information set corresponding to the any candidate information, and if the number of feature words in the first feature information set is smaller than the number of feature words in the second feature information set, determining the similarity between the information to be identified and each candidate information in the candidate information set according to the position information of each feature information in the first feature information set and the second feature information set, the number of feature words in the first feature information set and the number of feature words in the second feature information set.

If the number of the feature words in the second feature information set is smaller than the number of the feature words in the first feature information set, determining the similarity between the information to be identified and each candidate information in the candidate information sets according to the position information of each feature information in the second feature information set in the first feature information set and the second feature information set, the number of the feature words in the first feature information set and the number of the feature words in the second feature information set.

Specifically, in step S103, the similarity between the information to be identified and each candidate information in the candidate information sets is determined according to the number of the first feature information in the first feature information set, the number of the second feature information in the second feature information set, and the position information of each feature information in the feature information sets with a small number of feature information in the first feature information set and the second feature information set, as shown in fig. 5, and specifically includes the following processes S501 to S504:

s501, selecting the first comparison information with less number of the corresponding characteristic information in the information to be identified and the candidate information as the second comparison information with more number of the corresponding characteristic information in the information to be identified and the candidate information based on the first characteristic information number and the second characteristic information number of the candidate information for each candidate information.

For example, the candidate information set includes 70 pieces of candidate information, taking the candidate information as the 1 st candidate information as an example, if the information to be identified is a text to be identified, the number of the second feature information sets in the 1 st candidate information is the number of feature words in the second feature information sets, and if the information to be identified is an image, the number of the second feature information sets in the 1 st candidate information is the number of gray values in the second feature information sets.

The number of the first characteristic information is the number of characteristic words in the first characteristic information set, and if the number of the characteristic words in the first characteristic information set is smaller than the number of the characteristic words in the second characteristic information set, the first comparison information is the information to be identified, and the second comparison information is the 1 st candidate information; if the number of the feature words in the second feature information set is smaller than that in the first feature information set, the first comparison information is the 1 st candidate information, and the second comparison information is the information to be identified.

S502, traversing sequentially from any one of the third characteristic information sets corresponding to the first comparison information, and determining a characteristic information string set consisting of at least one characteristic information if characteristic information matched with a characteristic information string consisting of the continuously traversed third characteristic information exists from the fourth characteristic information set corresponding to the second comparison information.

If the first comparison information is the information to be identified, the third feature information set is the first feature information set mentioned above, and the fourth feature information set corresponding to the second comparison information is the second feature information set mentioned above.

For example, when the first feature information set is "Shanghai, A market, B market, purchase, and commodity C" as mentioned above, if the first feature information set is traversed in sequence from "Shanghai" to the beginning and the second feature information set is traversed in sequence from "Shanghai", the feature information matched with "Shanghai" and "A market" can be continuously found, but the feature information matched with "B market" can not be found, and the feature information matched with "purchase" and "commodity C" can be continuously found, the first feature information set includes two feature information strings, namely "Shanghai, A market" and "purchase, commodity C", respectively, and the obtained feature information string set includes "Shanghai, A market" and "purchase, commodity C".

Specifically, as shown in fig. 6, the feature information string may be acquired according to the following steps, specifically including S601 to S603:

s601, traversing sequentially from any one of the third characteristic information in the third characteristic information set corresponding to the first comparison information;

S602, judging whether feature information matched with the traversed third feature information continuously exists in the fourth feature information set;

and S603, if yes, extracting continuous third characteristic information from the third characteristic information set as a characteristic information string, wherein the characteristic information matched with the continuous third characteristic information is contained in the fourth characteristic information set.

Here, if the first comparison information is information to be identified, the third feature information set is exemplified by "Shanghai, market a, market B, purchase, and commodity C", and if any of the third feature information is "Shanghai".

Judging whether the fourth characteristic information set contains the characteristic information matched with Shanghai, if so, continuously judging whether the fourth characteristic information set contains the characteristic information matched with Shanghai, then continuously judging whether the fourth characteristic information set contains the characteristic information matched with Shanghai, purchasing and commodity C, if so, continuously storing the characteristic information of Shanghai, purchasing and commodity C in the fourth characteristic information set, and taking the continuous third characteristic information of Shanghai, purchasing and commodity C as a characteristic information string in the third characteristic information set.

For another example, in the process of sequentially traversing the third feature information of the third feature information set, determining that the fourth feature information set includes the feature information matched with "Shanghai", "market a" and the feature information matched with "market B" but does not include the feature information matched with "market B", and obtaining the first feature information string as "Shanghai market" and "market a", and also obtaining the second feature information string as "purchase and commodity C" if the fourth feature information set includes the feature information matched with "purchase" and "commodity C".

S503, for each feature information string, determining the similarity between the feature information string and the fourth feature information set according to the position of the first third feature information in the third feature information set, the position of the feature information matched with the first third feature information in the fourth feature information set, the position of the last third feature information in the third feature information set, the number of the third feature information in the third feature information set and the number of the fourth feature information in the fourth feature information set of the feature information string.

Specifically, the position of the first third feature information of the feature information string in the third feature information set may be a traversing position of the third feature information in the third feature information set, for example, traversing sequentially from the 1 st third feature information in the third feature information set, if the first third feature information of a certain feature information string is the third feature information traversed by the 3 rd in the third feature information set, the "position" herein is 3, where the "position" of the last third feature information of the feature information string in the position of the third feature information set may also be the traversing position of the third feature information in the third feature information set; the position of the feature information matched with the first third feature information of the feature information string in the fourth feature information set may also be the traversal position of the feature information matched with the first third feature information in the fourth feature information set.

In particular, the traversal order of the third set of feature information should be identical to the traversal order of the fourth set of feature information, i.e. in the same order or in the reverse order.

Specifically, the similarity of each of the characteristic information strings to the fourth characteristic information set may be determined according to the following formula (1):

wherein s is _i Representing the similarity of the ith characteristic information string in the characteristic information string set and the fourth characteristic information set; pos _B(i) Representing the traversing position of the first third characteristic information in the ith characteristic information string in the third characteristic information set; pos _A(i) Representing the position of the feature information matched with the first third feature information in the ith feature information string in the fourth feature information set; end (end) _point(i) Representing the position of the last third characteristic information in the ith characteristic information string in the third characteristic information set; l (L) _B The third characteristic information number L representing the third characteristic information set _A The fourth feature information number representing the fourth feature information set.

For example, when determining the similarity between the information to be identified and one of the candidate information, if the number of the first feature information sets corresponding to the information to be identified is smaller than the number of the second feature information sets corresponding to the candidate information, if it is determined according to the steps S601 to S603 that the feature information string set of the first feature information set includes 5 feature information strings, S can be calculated according to the formula (1) ₁ ～s ₅ The following applies to s ₁ The description is given of the solution process of the similarity between the 1 st piece of characteristic information string and the fourth characteristic information set (the second characteristic information set):

if the 1 st piece of characteristic information string includes 6 pieces of third characteristic information (first characteristic information), end _point(1) Representing the position of the 6 th third characteristic information in the 1 st characteristic information string in the third characteristic information set, pos _B(1) Representing the position of the 1 st third characteristic information in the 1 st characteristic information string in the third characteristic information set, pos _A(1) Representing the position of the 1 st third characteristic information in the 1 st characteristic information string in the fourth characteristic information set, where pos _A(1) The values of (2) may include more than 1, such as when the 1 st third characteristic information in the 1 st characteristic information string is "Shanghai", if the third characteristic information "Shanghai" appears twice in the fourth characteristic information set, pos _A(1) The value of (2) includes two, corresponding s ₁ The number of the similarity between the characteristic information string set and the fourth characteristic information set is at least 6, and if the 1 st third characteristic information in other characteristic information strings also appears in the fourth characteristic information set for a plurality of times, the number of the similarity between the characteristic information string set and the fourth characteristic information set is correspondingly increased.

S504, selecting the maximum similarity as the similarity between the information to be identified and the candidate information from the similarity between each characteristic information string in the characteristic information string sets and the fourth characteristic information set.

If the feature information string set includes 6 feature information strings as mentioned above, and the 1 st third feature information in each feature information string appears only 1 time in the fourth feature information set, the similarity set includes 6 similarities.

For example, the candidate information is the 1 st candidate information in the candidate information sets, if the number of the first characteristic information of the information to be identified is smaller than the number of the second characteristic information of the 1 st candidate information in the similarity calculation process between the information to be identified and the 1 st candidate information, and the obtained similarity set s _i ＝{s ₁ ,s ₂ ,s ₃ ,s ₄ ,s ₅ ,s ₆ 3 rd similarity s among 6 similarities among } ₃ The maximum similarity between the information to be identified and the 1 st candidate information is s ₃ According to the method, the similarity between the information to be identified and other candidate information can be obtained, and if the candidate information set comprises 10 pieces of candidate information, the similarity between the information to be identified and the 10 pieces of candidate information can be obtained.

Therefore, when determining the similarity between the information to be identified and the candidate information, the embodiment of the application simply determines whether the fourth feature information set contains the third feature information in the third feature information set with a small number of feature information, so as to obtain the feature information string, and then according to the position of the first third feature information and the last third feature information in the third feature information set in the feature information string, the position of the first third feature information in the fourth feature information set, the number of the third feature information in the third feature information set and the number of the fourth feature information set in the fourth feature information set, the similarity between the third feature information set and the fourth feature information set can be determined according to the formula (1), so that the determination efficiency of the similarity is improved.

S104, determining the attribute label corresponding to the information to be identified according to the determined similarity and the attribute label of the candidate information in the pre-stored information base.

In the pre-stored information base, each piece of candidate information is provided with a corresponding attribute label, if the attribute label of the candidate information is price sensitive, if the attribute label of the candidate information is high in product demand, if the attribute label of the candidate information is shopping lovers, if the attribute label of the candidate information is travel lovers, the attribute label corresponding to the information to be identified can be determined according to the determined similarity between the information to be identified and the candidate information and the attribute label of the candidate information in the pre-selected information base.

Specifically, in step S104, according to the determined similarity and the attribute tag of the candidate information in the pre-stored information base, the attribute tag corresponding to the information to be identified is determined, as shown in fig. 7, and specifically includes the following processes S701 to S704:

s701, judging whether first candidate information with the similarity with the information to be identified being greater than a preset threshold exists in a candidate information set;

s702, if any, the first candidate information is sorted based on the order of the similarity between the first candidate information and the information to be identified, and step S704 is performed.

S703, if not, outputting a prompt message indicating that the first candidate information is not found.

S704, determining the attribute label corresponding to the information to be identified according to a preset attribute label configuration strategy, the ordered first candidate information and the attribute labels of the first candidate information.

The preset threshold may be set according to the accuracy required in advance, for example, may be 0.8, and in the above example, if the candidate information set includes 10 pieces of candidate information, it may be determined whether there is a similarity greater than 0.8 among the similarities between the information to be identified and the 10 pieces of candidate information.

If the 8 pieces of similarity are greater than 0.8, the 8 pieces of similarity can be ranked according to the order of magnitude, so that 8 pieces of first candidate information with the similarity to the information to be identified being greater than 0.8 are obtained, and the attribute labels of the 8 pieces of first candidate information with the similarity to the information to be identified being greater than 0.8 possibly comprise a plurality of types.

If the preset attribute tag strategy is that 1 attribute tag which is most consistent with the information to be identified is determined, the attribute tag of the first candidate information with the highest similarity with the information to be identified is selected as the attribute tag of the information to be identified, if the information to be identified is a text to be identified input by a user, and when the attribute tag of the first candidate information is a high-demand product, the attribute tag corresponding to the information to be identified can be determined to be the high-demand product, namely, the user who inputs the information to be identified is the user with the high-demand product.

If the information to be identified is an image input by the user, when the attribute tag of the candidate information is the number 1001, for example, the candidate information is applied to the security management of a certain cell, and if the attribute tag of the candidate information is "1001", the attribute tag corresponding to the text to be identified can be determined to be "1001".

If the preset attribute label policy is to determine three types of attribute labels most corresponding to the information to be identified, then the first candidate information with the highest similarity to the information to be identified can be selected from the first candidate information corresponding to each type of attribute labels, then the attribute label of the first candidate information with the first three types of similarity ranks is selected as the attribute label of the information to be identified from the first candidate information with the highest similarity to the information to be identified, for example, the three types of attribute labels are price sensitivity, high quality requirement and shopping lovers respectively, and then the attribute label of the user corresponding to the information to be identified can be determined to be price sensitivity, high quality requirement and shopping lovers.

The process of determining the attribute label corresponding to the information to be identified according to the determined similarity and the attribute label of the candidate information in the pre-stored information base is correspondingly improved because the determination efficiency of the similarity is improved.

Based on the same application conception, the embodiment of the application also provides an information identification device corresponding to the information identification method, and because the principle of solving the problem by the device in the embodiment of the application is similar to that of the information identification method in the embodiment of the application, the implementation of the device can be referred to the implementation of the method, and the repetition is omitted.

The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.

An embodiment of the present application provides an information identifying apparatus 800, as shown in fig. 8, including:

a receiving module 801, configured to determine a first feature information set included in the information to be identified after receiving the information to be identified, where the first feature information set includes at least one first feature information, and transmit the first feature information set to the extracting module and the first determining module;

the extracting module 802 is configured to extract at least one piece of candidate information matched with the first feature information in a pre-stored information base to form a candidate information set, where each piece of candidate information includes a second feature information set formed by at least one piece of second feature information, and transmit the candidate information set to the first determining module;

A first determining module 803, configured to determine, according to the first number of pieces of feature information in the first set of feature information, the second number of pieces of feature information in the second set of feature information, and the position information of each piece of feature information in the first set of feature information and the second set of feature information in the second set of feature information, where the number of pieces of feature information is small, the similarity between the information to be identified and each piece of candidate information in the candidate information sets, and transmit the similarity to the second determining module;

the second determining module 804 is configured to determine, according to the determined similarity and the attribute tag of the candidate information in the pre-stored information base, an attribute tag corresponding to the information to be identified.

In some embodiments, the information to be recognized includes text to be recognized, the first feature information includes feature words, and the receiving module 801 is specifically configured to:

filtering the word units based on preset common words to obtain feature words, and arranging the feature words according to the position relation of each feature word in the text to be recognized to form a first feature information set of the text to be recognized.

In some embodiments, the information to be identified includes an image to be identified, the first feature information includes a gray value, and the receiving module 801 is specifically configured to:

dividing the gray level image according to the set rows and columns to obtain a plurality of gray level sub-images, and determining the gray level value of each gray level sub-image;

and according to the position information of each gray sub-image in the gray image, arranging the gray values of each gray sub-image to form a first characteristic information set.

In some embodiments, the extraction module 802 is specifically configured to:

traversing from any first characteristic information in the first characteristic information set, and searching whether the characteristic information matched with the first characteristic information currently traversed exists in a pre-stored information base;

and if the traversal is finished, the feature information matched with the first feature information is not found in the prediction information base, and prompt information for indicating that the candidate information is not found is output.

In some embodiments, the first determining module 803 is specifically configured to:

selecting less characteristic information corresponding to the information to be identified and the candidate information as first comparison information based on the first characteristic information number and the second characteristic information number of the candidate information, and selecting more characteristic information corresponding to the information to be identified and the candidate information as second comparison information;

sequentially traversing any one of the third characteristic information sets corresponding to the first comparison information, and determining a characteristic information string set consisting of at least one characteristic information string if characteristic information matched with the characteristic information strings consisting of the continuously traversed third characteristic information exists in a fourth characteristic information set corresponding to the second comparison information;

for each characteristic information string, determining the similarity of the characteristic information string and the fourth characteristic information set according to the position of the first third characteristic information of the characteristic information string in the third characteristic information set, the position of the characteristic information matched with the first third characteristic information in the fourth characteristic information set, the position of the last third characteristic information in the third characteristic information set, the number of the characteristic information of the third characteristic information set and the number of the characteristic information of the fourth characteristic information set;

And selecting the maximum similarity from the similarity between each characteristic information string in the characteristic information string sets and the fourth characteristic information set as the similarity between the information to be identified and the candidate information.

if yes, extracting continuous third characteristic information from the third characteristic information set as a characteristic information string, wherein the characteristic information matched with the continuous third characteristic information is contained in the fourth characteristic information set.

In some embodiments, the first determining module 803 determines the similarity between each of the feature information strings and the fourth feature information set according to the following formula:

wherein s is _i Representing the similarity of the ith characteristic information string in the characteristic information string set and the fourth characteristic information set; pos _B(i) Representing the position of the first third characteristic information in the ith characteristic information string in the third characteristic information set; pos _A(i) Representing the position of the feature information matched with the first third feature information in the ith feature information string in the fourth feature information set; end (end) _point(i) Representing the position of the last third characteristic information in the ith characteristic information string in the third characteristic information set; l (L) _B The third characteristic information number L representing the third characteristic information set _A The fourth characteristic information number of the fourth characteristic information set.

In some embodiments, the second determining module 804 is specifically configured to:

judging whether first candidate information with similarity to the information to be identified larger than a preset threshold exists in the candidate information set;

if the first candidate information exists, sorting the first candidate information based on the order of the similarity between the first candidate information and the information to be identified; determining an attribute tag corresponding to the information to be identified according to a preset attribute tag configuration strategy, the ordered first candidate information and the attribute tags of each first candidate information;

The embodiment of the application further provides a computer device 9, as shown in fig. 9, which is a schematic structural diagram of the computer device 900 provided in the embodiment of the application, including: a processor 901, a memory 902, and a bus 903. The memory 902 stores machine-readable instructions executable by the processor 901 (e.g., execution instructions corresponding to the receiving module 801, the extracting module 802, the first determining module 803, and the second determining module 804 in the information identifying apparatus in fig. 8), when the computer device 900 is running, the processor 901 communicates with the memory 902 through the bus 903, and when the machine-readable instructions are executed by the processor 901, the following process is performed:

After receiving the information to be identified, determining a first characteristic information set contained in the information to be identified, wherein the first characteristic information set contains at least one first characteristic information;

determining the similarity of the information to be identified and each candidate information in the candidate information sets according to the first characteristic information number of the first characteristic information set, the second characteristic information number of the second characteristic information set and the position information of each characteristic information in the characteristic information sets with less characteristic information numbers in the first characteristic information set and the second characteristic information set;

In a possible implementation manner, the information to be identified includes text to be identified, the first feature information includes feature words, and the instructions executed by the processor 901 specifically include:

In a possible implementation manner, the information to be identified includes an image to be identified, the first feature information includes a gray value, and the instructions executed by the processor 901 specifically include:

In a possible implementation manner, the instructions executed by the processor 901 specifically include:

If the characteristic information exists, extracting candidate information of the existing characteristic information;

and forming a candidate information set according to each extracted candidate information.

In a possible implementation manner, the instructions executed by the processor 901 further include:

if the traversal is finished, the feature information containing the first feature information matching is not found in the pre-stored information base, and prompt information for indicating that the candidate information is not found is output.

selecting, for each candidate information, based on the first feature information number and the second feature information number of the candidate information, less feature information corresponding to the to-be-identified information and the candidate information as first comparison information, and selecting more feature information corresponding to the to-be-identified information and the candidate information as second comparison information;

For each characteristic information string, determining the similarity of the characteristic information string and the fourth characteristic information set according to the position of the first third characteristic information in the third characteristic information set, the position of the characteristic information matched with the first third characteristic information in the fourth characteristic information set, the position of the last third characteristic information in the third characteristic information set, the number of the third characteristic information in the third characteristic information set and the number of the fourth characteristic information in the fourth characteristic information set of the characteristic information string;

if so, extracting continuous third characteristic information from the third characteristic information set as a characteristic information string, wherein the characteristic information matched with the continuous third characteristic information is contained in the fourth characteristic information set.

In a possible implementation manner, in the instructions executed by the processor 901, the similarity between each feature information string and the fourth feature information set is determined according to the following formula:

if the first candidate information exists, extracting the similarity sequence of the first candidate information and the information to be identified, and sequencing the first candidate information;

And determining the attribute label corresponding to the information to be identified according to a preset attribute label configuration strategy, the ordered first candidate information and the attribute labels of the first candidate information.

and if the first candidate information with the similarity with the information to be identified being greater than a preset threshold value does not exist in the candidate information set, outputting prompt information for indicating that the first candidate information is not found.

The embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described information identification method.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when the computer program on the storage medium is run, the information identification method can be executed, so that the problem of low identification efficiency in the prior art when the target user is identified based on the similarity is solved, and the effect of improving the identification efficiency of the attribute label of the target object is achieved.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the method embodiments, which are not described in detail in this application. In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physically located in one place, or may be distributed over multiple networks. Some or all of them can be selected according to actual needs to achieve the purpose of the embodiment scheme.

In addition, the functions in the embodiments of the present application may be integrated in one process, may exist alone, or may be integrated in one or more than two.

The functions, if implemented in the form of software functions and sold or used as a stand-alone product, may be stored on a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions are covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An information identifying apparatus, comprising:

2. The information identifying apparatus according to claim 1, wherein the information to be identified includes text to be identified, the first feature information includes feature words, and the receiving module is specifically configured to:

3. The information identifying apparatus according to claim 1, wherein the information to be identified includes an image to be identified, and the first characteristic information includes a gradation value; the receiving module is specifically configured to:

4. The information recognition device according to claim 1, wherein the extraction module is specifically configured to:

5. The information identifying apparatus of claim 1, wherein the first determining module is specifically configured to:

6. The information identifying apparatus of claim 5, wherein the first determining module is specifically configured to:

7. The information identifying apparatus of claim 1, wherein the second determining module is specifically configured to:

8. An information identification method, comprising:

9. A computer device, comprising: a processor, a storage medium, and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the computer device is running, the processor executing the machine-readable instructions to perform the steps of the information identification method of claim 8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the information identification method according to claim 8.