CN111860549A

CN111860549A - Information recognition device, method, computer device, and storage medium

Info

Publication number: CN111860549A
Application number: CN201910277264.4A
Authority: CN
Inventors: 兰红云
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2019-04-08
Filing date: 2019-04-08
Publication date: 2020-10-30
Anticipated expiration: 2039-04-08
Also published as: CN111860549B

Abstract

The application provides an information identification device, a method, a computer device and a storage medium, wherein the information identification device comprises: the device comprises a receiving module, a judging module and a judging module, wherein the receiving module is used for determining a first characteristic information set contained in information to be identified after the information to be identified is received; the extraction module is used for extracting at least one piece of candidate information matched with the first characteristic information from a pre-stored information base to form a candidate information set; the first determining module is used for determining the similarity between the information to be identified and each candidate information in the candidate information set according to the number of the feature information in the first feature information set, the number of the feature information in the second feature information set and the position of each feature information in the feature information set with less feature information in the first feature information set and the second feature information set; the second determining module determines the attribute label corresponding to the information to be identified according to the determined similarity and the attribute label of the candidate information in the pre-stored information base.

Description

Information recognition device, method, computer device, and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an information recognition apparatus, an information recognition method, a computer device, and a storage medium.

Background

At present, various scenes need to identify the attribute tag of a target object, for example, an online store can identify the attribute tag of a user according to text feature information input by the user, so that the user can be better served by determining the type of the user, and for example, in the field of security monitoring, the face image of the user can be identified, so that the identity tag of the user can be determined.

When identifying a target object, generally, information associated with the target object, such as a text or an image, is compared with all information in a pre-established information base, so as to determine similarity between the information and the information in the text base, and further determine an attribute tag of the target object.

Disclosure of Invention

In view of the above, an object of the present application is to provide an information identification apparatus, an information identification method, a computer device, and a storage medium, so as to improve efficiency of identifying an attribute tag of a target object.

In a first aspect, an embodiment of the present application provides an information identification apparatus, including:

the device comprises a receiving module, a first determining module and a second determining module, wherein the receiving module is used for determining a first characteristic information set contained in information to be identified after the information to be identified is received, the first characteristic information set contains at least one piece of first characteristic information, and the first characteristic information set is transmitted to the extracting module and the first determining module;

the extracting module is configured to extract at least one piece of candidate information matched with the first feature information from a pre-stored information base to form a candidate information set, where each piece of candidate information includes a second feature information set formed by at least one piece of second feature information, and transmit the candidate information set to the first determining module;

the first determining module is used for determining the similarity between the information to be identified and each candidate information in the candidate information set according to the number of first feature information in the first feature information set, the number of second feature information in the second feature information set, and the position information of each feature information in the feature information sets with the small number of feature information in the first feature information set and the second feature information set, and transmitting the similarity to the second determining module;

And the second determining module is used for determining the attribute label corresponding to the information to be identified according to the determined similarity and the attribute label of the candidate information in the pre-stored information base.

In some embodiments, the information to be recognized includes a text to be recognized, the first feature information includes feature words, and the receiving module is specifically configured to:

after receiving a text to be recognized input by a target object, performing word segmentation processing on the text to be recognized to obtain a plurality of word units;

and filtering the word units based on preset common words to obtain the characteristic words, and arranging the characteristic words according to the position relation of the characteristic words in the text to be recognized to form a first characteristic information set of the text to be recognized.

In some embodiments, the information to be identified includes an image to be identified, and the first feature information includes a grayscale value; the receiving module is specifically configured to:

after receiving the image to be identified, if the image to be identified is a color image, converting the color image into a gray image;

dividing the gray level image according to set rows and columns to obtain a plurality of gray level sub-images, and determining the gray level value of each gray level sub-image;

And arranging the gray values of the gray sub-images according to the position information of each gray sub-image in the gray image to form the first characteristic information set.

In some embodiments, the extraction module is specifically configured to:

traversing from any first characteristic information in the first characteristic information set, and searching whether characteristic information matched with the currently traversed first characteristic information exists in the pre-stored information base;

if the candidate information exists, extracting the candidate information where the existing characteristic information exists, and forming the candidate information set according to the extracted candidate information;

and if the traversal is finished, not finding the characteristic information matched with the first characteristic information in the pre-stored information base, and outputting prompt information for indicating that the candidate information is not found.

In some embodiments, the first determining module is specifically configured to:

for each candidate information, based on the first feature information number and the second feature information number of the candidate information, selecting the information to be identified and the candidate information with fewer corresponding feature information as first comparison information, and selecting the information to be identified and the candidate information with more corresponding feature information as second comparison information;

Sequentially traversing from any third feature information in a third feature information set corresponding to the first comparison information, and if feature information matched with a feature information string consisting of continuously traversed third feature information exists in a fourth feature information set corresponding to the second comparison information, determining a feature information string set consisting of at least one feature information string;

for each feature information string, determining the similarity between the feature information string and a fourth feature information set according to the position of the first third feature information of the feature information string in the third feature information set, the position of feature information matched with the first third feature information in the fourth feature information set, the position of the last third feature information in the third feature information set, the number of third feature information of the third feature information set and the number of fourth feature information of the fourth feature information set;

and selecting the maximum similarity as the similarity between the information to be identified and the candidate information from the similarities between each feature information string in the feature information string set and a fourth feature information set.

Sequentially traversing from any third feature information in a third feature information set corresponding to the first comparison information;

judging whether feature information matched with the traversed third feature information continuously exists in the fourth feature information set or not;

if so, extracting continuous third feature information as the feature information string in the third feature information set, wherein the feature information matched with the continuous third feature information is included in the fourth feature information set.

In some embodiments, the second determining module is specifically configured to:

judging whether first candidate information with the similarity larger than a preset threshold exists in the candidate information set;

if so, sorting the first candidate information based on the similarity order of the first candidate information and the information to be identified; determining an attribute label corresponding to the information to be identified according to a preset attribute label configuration strategy, the sorted first candidate information and the attribute label of each first candidate information;

and if the first candidate information does not exist, outputting prompt information for indicating that the first candidate information is not found.

In a second aspect, an embodiment of the present application provides an information identification method, including:

After receiving information to be identified, determining a first characteristic information set contained in the information to be identified, wherein the first characteristic information set contains at least one piece of first characteristic information;

extracting at least one piece of candidate information matched with the first characteristic information from a pre-stored information base to form a candidate information set; each piece of candidate information comprises a second characteristic information set consisting of at least one piece of second characteristic information;

determining the similarity between the information to be identified and each candidate information in the candidate information set according to the number of first feature information in the first feature information set, the number of second feature information in the second feature information set, and the position information of each feature information in the feature information set with the small number of feature information in the first feature information set and the second feature information set;

and determining an attribute label corresponding to the information to be identified according to the determined similarity and the attribute label of the candidate information in the pre-stored information base.

In a third aspect, an embodiment of the present application provides a computer device, including: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when a computer device runs, the processor communicates with the storage medium through the bus, and the processor executes the machine-readable instructions to execute the steps of the information identification method according to the second aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the information identification method according to the second aspect.

In the embodiment of the application, firstly, a candidate information set is selected from a pre-stored information base according to first feature information, then the information to be identified is only compared with the candidate information in the candidate information set in similarity, so that the comparison time is greatly shortened, secondly, when the similarity is determined, the similarity of the information to be identified and the candidate information is determined only according to the number of the first feature information of the information to be identified, the number of the second feature information of the candidate information and the position of each feature information in the first feature information set and the second feature information set in the feature information sets with less number of feature information, therefore, in the similarity determination process, the similarity of the information to be identified and the candidate information can be rapidly determined only by positioning the positions of less feature information in the feature information sets and combining the numbers of the feature information of the two feature information sets with the similarity comparison, therefore, the attribute label corresponding to the information to be identified is determined more quickly, and the information identification efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a flowchart illustrating an information identification method provided by an embodiment of the present application;

fig. 2 is a flowchart illustrating a method for determining a first feature information set and a first feature information number of the first feature information set according to an embodiment of the present application;

fig. 3 is a flowchart illustrating another method for determining a first feature information set and the number of first feature information in the first feature information set according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a method for obtaining a candidate information set according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating a method for determining similarity between information to be identified and each candidate information in a candidate information set according to an embodiment of the present disclosure;

FIG. 6 is a flow chart of a method for obtaining a feature information string set according to an embodiment of the present application;

Fig. 7 is a flowchart illustrating a method for determining an attribute tag corresponding to information to be identified according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of an information recognition apparatus according to an embodiment of the present application;

fig. 9 shows a schematic structural diagram of a computer device provided in an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

In order to enable a person skilled in the art to use the present disclosure, the following embodiments are given in connection with the specific application scenario "attribute tag identification of a target object". It will be apparent to those skilled in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the application. Although the present application is primarily described in the context of attribute tag identification of a target object, it should be understood that this is merely one exemplary embodiment.

It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

Aiming at the problem that in the prior art, when the attribute tag of the target object is identified based on the similarity, the identification efficiency is low, the embodiment of the application provides an information identification method to improve the identification efficiency.

The embodiment of the application provides an information identification method, as shown in fig. 1, which includes the following processes S101 to S104:

s101, after receiving the information to be identified, determining a first characteristic information set contained in the information to be identified, wherein the first characteristic information set contains at least one piece of first characteristic information.

The information to be recognized herein can represent different contents in different application scenarios, for example, the information can be a text to be recognized, that is, the text to be recognized can be recognized by receiving the text to be recognized input by the user, so as to determine an attribute tag of the user associated with the text to be recognized, where the target object is the user; alternatively, the information to be identified may also be an image, for example, the image is an image of a target object, and an attribute tag of the target object may be determined according to identifying the image, where the target object may be a user or an article.

The attribute tag may be a type of the target object, for example, if the target object is a user, it may be identified what type of user the user is, for example, a price sensitive user, a high demand user, a shopping fan user, or a travel fan user, where price sensitive, product high demand, shopping fan, and travel fan are the attribute tags to be identified.

In one embodiment, when the information to be recognized is a text to be recognized, the first feature information includes a feature word, and in step S101, after receiving the information to be recognized associated with the target object, a first feature information set included in the information to be recognized is determined, as shown in fig. 2, including the following specific flows S201 to S203:

s201, after receiving a text to be recognized input by a target object, performing word segmentation processing on the text to be recognized to obtain a plurality of word units;

s202, filtering the word units based on preset common words to obtain feature words, and arranging the feature words according to the position relation of the feature words in the text to be recognized to form a first feature information set of the text to be recognized.

The target object here may be a user to be subjected to attribute tag identification, for example, a text to be identified input by the user on a website is "buy a product C in shopping mall a or shopping mall B in Shanghai", and the text to be identified may be first subjected to a word segmentation process by a word segmentation device to obtain several word units of "in", "Shanghai", "in", "shopping mall a", "or", "shopping mall B", "purchase", "product C", and "product", where "product C" may be a certain product with a certain characteristic, such as an expensive product.

Then, filtering the word units based on the common word dictionary, for example, removing common words without practical meaning such as "in", "or" and "product", to obtain five feature words, which are: the five characteristic words are arranged according to the position relations of the five characteristic words in the text to be recognized, namely, the text to be recognized, including the Shanghai, the market A, the market B, the purchase and the commodity C.

In addition, after the first feature information set is obtained, the number of feature words in the first feature information set "shanghai, mall a, mall B, purchase, commodity C" may also be extracted, and the number is 5, that is, it is determined that the number of the first feature information in the first feature information set is 5.

In another embodiment, when the information to be recognized includes an image to be recognized, the first feature information includes a gray scale value, and in step S101, after receiving the information to be recognized, the first feature information set included in the information to be recognized is determined, as shown in fig. 3, including the following processes S301 to S303:

S301, after receiving the image to be identified, if the image to be identified is a color image, converting the color image into a gray image;

s302, dividing the gray image according to set rows and columns to obtain a plurality of gray sub-images, and determining the gray value of each gray sub-image;

and S303, arranging the gray values of the gray sub-images according to the position information of each gray sub-image in the gray image to form a first characteristic information set.

The image to be recognized may be a human image or an article image, and if the received image to be recognized is a color image, the color image is first converted into a grayscale image, and if the received image to be recognized is a grayscale image, step S301 does not need to be executed.

Here, dividing the grayscale image into the set rows and columns means dividing the grayscale image into grayscale sub-images having the same size, and the size of each grayscale sub-image is the same as the size of the grayscale sub-image corresponding to each grayscale value in the image stored in the pre-stored information library, for example, if each grayscale value of a certain image stored in the pre-stored information library is the grayscale value of the grayscale sub-image having the size of 16 × 16, the set rows and columns herein means dividing the grayscale image into a plurality of grayscale sub-images having the size of 16 × 16.

For example, the size of a certain grayscale image is 256 × 256, if the size of the grayscale sub-image corresponding to each grayscale value in each image stored in the pre-stored information base is 16 × 16, the received grayscale image may be divided into 16 rows and 16 columns, that is, the size of each grayscale sub-image is also 16 × 16, that is, 256 grayscale sub-images are obtained, each grayscale sub-image includes 256 pixel points, for each grayscale sub-image, the grayscale value of each grayscale sub-image is determined according to the average grayscale value of the pixel points in the grayscale sub-image, for example, the average grayscale value of all the pixel points in one grayscale sub-image is 225, and the grayscale value of the grayscale sub-image is 225.

Then, the gray-scale values of the gray-scale sub-images are arranged according to the position information of each gray-scale sub-image in the gray-scale image, for example, after the received gray-scale image is divided into M rows and N columns, the gray-scale image becomes a gray-scale sub-image of M rows and N columns, starting from the second row, the first gray-scale connection in the gray-scale sub-image of each row is connected with the last gray-scale sub-image in the gray-scale sub-image of the previous row, and the gray-scale sub-image is arranged into a row of M × N columns according to the above way, where the position information of each gray-scale sub-image in the gray-scale image can be the coordinate information in the gray-scale image, for example, the position of a certain gray-scale sub-image is the second row and the second column, i.e., the position information can be expressed as (.

According to the above manner, if the grayscale image has 16 × 16 grayscale values, the grayscale values are arranged, for example, after the grayscale image is divided into 16 rows and 16 columns, starting from row 2, the first grayscale connection in the grayscale sub-image of each row is sequentially connected to the last grayscale sub-image in the grayscale sub-image of the previous row, so as to obtain a grayscale sub-image with 1 row and 256 columns, and correspondingly, the 16 × 16 grayscale values are also arranged in the order of arrangement of the grayscale sub-images, so as to obtain the first feature information set.

In addition, after the first feature information set is obtained, the number of gray-scale values of the first feature information set may also be extracted, for example, if the first feature information set includes 256 gray-scale values, the number of first feature information is 256 here.

S102, extracting at least one piece of candidate information matched with the first characteristic information from a pre-stored information base to form a candidate information set; each piece of candidate information comprises a second characteristic information set formed by at least one piece of second characteristic information.

The pre-stored information base is a pre-established information base, a plurality of pieces of information are stored in the information base, each piece of information comprises at least one piece of characteristic information, the characteristic information can be a characteristic word or a gray value, each piece of information corresponds to an attribute label, and the attribute label corresponding to each piece of information is determined and stored in advance according to the characteristic information in the piece of information.

The candidate information matched with the first feature information may be feature information matched with at least one piece of first feature information included in the candidate information, where the feature information matched with the first feature information may be completely the same as the first feature information, or may be the same as the type of the first feature information, for example, if one piece of first feature information in the first feature information set is "C commodity", the candidate information matched with the first feature information means that the corresponding second feature information set includes "C commodity" or includes a commodity belonging to the same type as the "C commodity", for example, if the C commodity is a face skin cream with a high price, the commodity belonging to the same type as the "C commodity" may also be another face skin cream belonging to the same price interval as the price of the C commodity, and may not necessarily be the C commodity.

Specifically, when the information to be recognized is the text to be recognized, after the first feature information set of the information to be recognized is determined, vectorization may be performed on the first feature information set, and in this case, the second feature information set of the candidate information in the pre-stored information base is also stored in the pre-stored information base in a vector form, so that, because the feature values corresponding to the feature information belonging to the same class may be the same, the number of candidate information sets is increased when the candidate information set is extracted.

Specifically, in step S102, at least one piece of candidate information matching the first feature information is extracted from the pre-stored information base to form a candidate information set, as shown in fig. 4, the method includes the following steps S401 to S403:

s401, traversing from any first feature information in a first feature information set, and searching whether feature information matched with the currently traversed first feature information exists in a pre-stored information base;

s402, if the characteristic information exists, extracting candidate information of the existing characteristic information;

and S403, forming a candidate information set according to the extracted candidate information.

The candidate information where the feature information is located may be extracted, for example, if the candidate information where the feature information is located is also "mall C", the candidate information where the feature information is located is extracted, where the feature information matched with "mall C" is also "mall C", the candidate information where the "mall C" is located is extracted, then, whether feature information matched with ' purchase ' exists is searched in a pre-stored information base, and the candidate information matched with each piece of first feature information is finally extracted in sequence, for example, according to the method, 10 pieces of candidate information matched with ' commodity C ', 15 pieces of candidate information matched with ' purchase ', 20 pieces of candidate information matched with ' mall A ', 15 pieces of candidate information matched with ' mall B ', and 10 pieces of candidate information matched with Shanghai ', the obtained candidate information in the candidate information set comprises 70 pieces of candidate information in total.

In addition, if the traversal is finished, the feature information matched with the first feature information is not found in the pre-stored information base, and prompt information used for indicating that the candidate information is not found is output.

For example, if the feature information matched with "shanghai", "mall a", "mall B", "purchase", or "commodity C" is not found in the pre-stored information base, it indicates that the information related to the information to be identified is not stored in the pre-stored information base, and at this time, prompt information for indicating that the candidate information is not found may be output, for example, voice prompt may be performed or display may be performed through a display screen, and then the staff performs subsequent processing.

S103, determining the similarity between the information to be identified and each candidate information in the candidate information set according to the number of the first feature information in the first feature information set, the number of the second feature information in the second feature information set, and the position information of each feature information in the feature information set with the small number of feature information in the first feature information set and the second feature information set.

When the information to be identified is a text to be identified, aiming at any candidate information in the candidate information set, when the similarity between the information to be identified and the any candidate information is determined, the number of characteristic words in a first characteristic information set corresponding to the information to be identified and the number of characteristic words in a second characteristic information set corresponding to the any candidate information are compared, and if the number of the characteristic words in the first characteristic information set is less than that of the characteristic words in the second characteristic information set, the similarity between the information to be identified and the each candidate information in the candidate information set is determined according to the position information of each characteristic information in the first characteristic information set and the second characteristic information set, the number of the characteristic words in the first characteristic information set and the number of the characteristic words in the second characteristic information set.

If the number of the feature words in the second feature information set is less than that of the feature words in the first feature information set, determining the similarity between the information to be identified and each candidate information in the candidate information set according to the position information of each feature information in the second feature information set in the first feature information set and the second feature information set, the number of the feature words in the first feature information set and the number of the feature words in the second feature information set.

Specifically, in step S103, determining the similarity between the information to be identified and each candidate information in the candidate information set according to the first feature information number of the first feature information set, the second feature information number of the second feature information set, and the position information of each feature information in the feature information sets with the small feature information number in the first feature information set and the second feature information set, as shown in fig. 5, the method specifically includes the following steps S501 to S504:

s501, for each candidate information, based on the number of first feature information and the number of second feature information of the candidate information, selects the information to be identified and the candidate information having a smaller number of corresponding feature information as first comparison information, and selects the information to be identified and the candidate information having a larger number of corresponding feature information as second comparison information.

For example, the candidate information set includes 70 pieces of candidate information, taking the candidate information as the 1 st candidate information as an example, if the information to be recognized is a text to be recognized, the number of second feature information of the second feature information set in the 1 st candidate information is the number of feature words in the second feature information set, and if the information to be recognized is an image, the number of second feature information of the second feature information set in the 1 st candidate information is the number of gray values in the second feature information set.

The number of the first feature information is the number of the feature words in the first feature information set, if the number of the feature words in the first feature information set is less than the number of the feature words in the second feature information set, the first comparison information is information to be identified, and the second comparison information is the 1 st candidate information; if the number of the feature words in the second feature information set is less than that of the feature words in the first feature information set, the first comparison information is the 1 st candidate information, and the second comparison information is the information to be identified.

And S502, sequentially traversing from any third feature information in a third feature information set corresponding to the first comparison information, and determining a feature information string set consisting of at least one feature information if feature information matched with a feature information string consisting of continuously traversed third feature information exists in a fourth feature information set corresponding to the second comparison information.

If the first comparison information is to-be-identified information, the third feature information set is the above-mentioned first feature information set, and the fourth feature information set corresponding to the second comparison information is the above-mentioned second feature information set.

For example, when the first feature information set is "shanghai, market a, market B, purchase, and product C" mentioned above, if the user starts to traverse from "shanghai" in the forward direction, and from the second feature information set, the feature information that is respectively matched with "shanghai" and "market a" can be continuously found, but the feature information that is matched with "market B" can not be found, and the feature information that is respectively matched with "purchase" and "product C" can be continuously found, the first feature information set includes two feature information strings, which are "shanghai, market a" and "purchase, and product C", respectively, that is, the obtained feature information string set includes "shanghai, market a" and "purchase, and product C".

Specifically, as shown in fig. 6, the feature information string may be obtained according to the following steps, specifically including S601 to S603:

s601, traversing in sequence from any third feature information in a third feature information set corresponding to the first comparison information;

S602, judging whether feature information matched with the traversed third feature information continuously exists in the fourth feature information set;

and S603, if so, extracting continuous third feature information in the third feature information set as a feature information string, wherein the feature information matched with the continuous third feature information is contained in a fourth feature information set.

Here, if the first comparison information is to-be-identified information, the third feature information set is exemplified as "shanghai, mall a, mall B, purchase, and commodity C", and if any of the third feature information sets is "shanghai".

And judging whether the fourth characteristic information set contains characteristic information matched with Shanghai or not, if so, continuously judging whether the fourth characteristic information set contains characteristic information matched with market A or not, and then continuously judging whether the fourth characteristic information set contains market B, purchase and commodity C or not, for example, if the fourth characteristic information set continuously contains market B, purchase and commodity C, continuously taking the continuous third characteristic information 'market B, purchase and commodity C' as a characteristic information string in the third characteristic information set.

For another example, in the process of sequentially traversing the third feature information of the third feature information set, it is determined that the fourth feature information set includes feature information matched with "shanghai" and "mall a" and does not include feature information matched with "mall B", the first feature information string is obtained as "shanghai and mall a", and similarly, if the fourth feature information set includes feature information matched with "buy" and "commodity C", the second feature information string "buy" and "commodity C" is obtained.

S503, for each feature information string, determining a similarity between the feature information string and the fourth feature information set according to a position of the first third feature information in the third feature information set, a position of the feature information in the fourth feature information set matching with the first third feature information, a position of the last third feature information in the third feature information set, a third feature information number of the third feature information set, and a fourth feature information number of the fourth feature information set.

Specifically, the position of the first third feature information of the feature information string in the third feature information set may be a traversal position of the third feature information in the third feature information set, for example, sequentially traversal is performed from the 1 st third feature information in the third feature information set, if the first third feature information of a certain feature information string is the third feature information traversed by the 3 rd third feature information in the third feature information set, the "position" here is 3, where the "position" of the last third feature information of the feature information string in the position of the third feature information set may also be the traversal position of the third feature information in the third feature information set; the position of the feature information in the fourth feature information set, where the first third feature information of the feature information string matches the feature information, may also be the traversal position of the feature information in the fourth feature information set, where the first third feature information matches the feature information.

In particular, the traversal order of the third feature information set should coincide with the traversal order in the fourth feature information set, i.e. both in order and in reverse order.

Specifically, the similarity of each feature information string in the feature information string set to the fourth feature information set may be determined according to the following formula (1):

wherein s is_iRepresenting the similarity between the ith characteristic information string in the characteristic information string set and the fourth characteristic information set; pos_B(i)Representing the traversal position of the first third feature information in the ith feature information string in the third feature information set; pos_A(i)The position of the feature information matched with the first third feature information in the ith feature information string in the fourth feature information set is represented; end_point(i)The position of the last third characteristic information in the ith characteristic information string in the third characteristic information set is represented; l is_BNumber of third feature information, L, representing third feature information set_AAnd a fourth feature information number indicating the fourth feature information set.

For example, when determining the similarity between the information to be recognized and one of the candidate information, if the number of first feature information of the first feature information set corresponding to the information to be recognized is smaller than the number of second feature information of the candidate information, if it is determined from the above steps S601 to S603 that the feature information string set of the first feature information set includes 5 feature information strings, S may be obtained for the above formula (1) ₁～s₅Below for s₁The solving process of (1), that is, the solving process of the similarity between the 1 st feature information string and the fourth feature information set (second feature information set):

if the 1 st feature information string includes 6 pieces of third feature information (first feature information), end_point(1)Indicates the position, pos, of the 6 th third feature information in the 1 st feature information string in the third feature information set_B(1)Indicates the position, pos, of the 1 st third feature information in the 1 st feature information string in the third feature information set_A(1)Indicates the position of the 1 st third feature information in the 1 st feature information string in the fourth feature information set, where pos_A(1)May include more than 1, for example, when the 1 st third feature information in the 1 st feature information string is "shanghai", if the third feature information "shanghai" appears twice in the fourth feature information set, pos is_A(1)Includes two, corresponding s₁The number of similarities between the feature information string set and the fourth feature information set is at least 6, and if the 1 st third feature information in the other feature information strings also appears in the fourth feature information set multiple times, the number of similarities between the feature information string set and the fourth feature information set correspondingly increases.

S504, selecting the maximum similarity as the similarity between the information to be identified and the candidate information from the similarities between each feature information string in the feature information string set and the fourth feature information set.

As mentioned above, if the feature information string set includes 6 feature information strings, and the 1 st third feature information in each feature information string appears only 1 time in the fourth feature information set, the similarity set includes 6 similarities.

For example, the candidate information is the 1 st candidate information in the candidate information set, if the information to be identified and the 1 st candidate information perform similarity calculation, the number of the first feature information of the information to be identified is smaller than the number of the second feature information of the 1 st candidate information, and the obtained similarity set s_i＝{s₁,s₂,s₃,s₄,s₅,s₆The 3 rd similarity s of the 6 similarities in₃At maximum, the similarity between the information to be identified and the 1 st candidate information is s₃According to the method, the similarity between the information to be identified and other candidate information can be obtained, and if the candidate information set comprises 10 pieces of candidate information, the similarity between the information to be identified and the 10 pieces of candidate information can be obtained.

Therefore, when determining the similarity between the information to be identified and the candidate information, the embodiment of the application only needs to simply judge whether the fourth feature information set contains the third feature information in the third feature information set with a small number of feature information to obtain the feature information string, and then, according to the position of the first third feature information and the last third feature information in the feature information string in the third feature information set, the position of the first third feature information in the fourth feature information set, the number of the third feature information in the third feature information set, and the number of the fourth feature information set in the fourth feature information set, the similarity between the third feature information set and the fourth feature information set can be determined according to the above formula (1), so that the determination efficiency of the similarity is improved.

And S104, determining the attribute label corresponding to the information to be identified according to the determined similarity and the attribute label of the candidate information in the pre-stored information base.

In the pre-stored information base, each candidate information has its corresponding attribute label, for example, if some candidate information attribute labels are price sensitivity, some candidate information attribute labels are high demand for products, some candidate information attribute labels are shopping fans, and some candidate information attribute labels are tourists, the attribute label corresponding to the information to be identified can be determined according to the determined similarity between the information to be identified and the candidate information and the attribute labels of the candidate information in the pre-selected information base.

Specifically, in step S104, according to the determined similarity and the attribute label of the candidate information in the pre-stored information base, the attribute label corresponding to the information to be identified is determined, as shown in fig. 7, which specifically includes the following procedures S701 to S704:

s701, judging whether first candidate information with similarity larger than a preset threshold exists in the candidate information set or not;

s702, if yes, sorting the first candidate information based on the similarity between each first candidate information and the information to be identified, and executing step S704.

And S703, if the first candidate information does not exist, outputting prompt information for indicating that the first candidate information is not found.

S704, determining the attribute label corresponding to the information to be identified according to a preset attribute label configuration strategy, the sorted first candidate information and the attribute label of each first candidate information.

The preset threshold may be set according to an accuracy required in advance, for example, the preset threshold may be 0.8, and in the above example, if the candidate information set includes 10 pieces of candidate information, it may be determined whether there is a similarity greater than 0.8 in the similarity between the information to be identified and the 10 pieces of candidate information.

If 8 similarity degrees are greater than 0.8, the 8 similarity degrees may be sorted according to the size order, that is, 8 pieces of first candidate information having a similarity degree greater than 0.8 with the information to be identified are obtained, and attribute labels of the obtained 8 pieces of first candidate information having a similarity degree greater than 0.8 with the information to be identified may include multiple types.

If the preset attribute label strategy is to determine 1 attribute label that the information to be identified most conforms to, selecting the attribute label of the first candidate information with the highest similarity to the information to be identified as the attribute label of the information to be identified, and if the information to be identified is the text to be identified input by the user, when the attribute label of the first candidate information is the high demand of the product, determining that the attribute label corresponding to the information to be identified is the high demand of the product, that is, the user inputting the information to be identified is the user with high demand of the product.

If the information to be recognized is an image input by the user, if the attribute tag of the candidate information is the number 1001, and the attribute tag of the candidate information is "1001", it can be determined that the attribute tag corresponding to the text to be recognized is "1001", for example, when the attribute tag of the candidate information is applied to the security management of a certain cell.

If the preset attribute tag strategy is to determine three types of attribute tags that the information to be identified most conforms to, then the first candidate information with the highest similarity to the information to be identified may be selected from the first candidate information corresponding to each attribute tag, and then the attribute tags of the first candidate information with the top three similarity ranks may be selected from the first candidate information with the highest similarity to the information to be identified as the attribute tags of the information to be identified, for example, the three types of attribute tags are price sensitivity, high-quality demand and shopping enthusiasts, respectively, and then the attribute tags of the user corresponding to the information to be identified may be price sensitivity, high-quality demand and shopping enthusiasts.

In the process of determining the attribute label corresponding to the information to be identified according to the determined similarity and the attribute label of the candidate information in the pre-stored information base, because the determination efficiency of the similarity is improved, the efficiency of determining the attribute label corresponding to the identification information is correspondingly improved.

Based on the same application concept, an information identification device corresponding to the information identification method is further provided in the embodiment of the present application, and as the principle of solving the problem of the device in the embodiment of the present application is similar to the information identification method in the embodiment of the present application, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

An embodiment of the present application provides an information identification apparatus 800, as shown in fig. 8, including:

the receiving module 801 is configured to, after receiving the information to be identified, determine a first feature information set included in the information to be identified, where the first feature information set includes at least one piece of first feature information, and transmit the first feature information set to the extracting module and the first determining module;

an extracting module 802, configured to extract at least one candidate information matched with the first feature information from a pre-stored information base to form a candidate information set, where each candidate information includes a second feature information set formed by at least one second feature information, and transmit the candidate information set to the first determining module;

A first determining module 803, configured to determine, according to the number of first feature information in a first feature information set, the number of second feature information in a second feature information set, and location information of each feature information in the feature information sets with a small number of feature information in the first feature information set and the second feature information set, a similarity between information to be identified and each candidate information in the candidate information sets, and transmit the similarity to a second determining module;

a second determining module 804, configured to determine, according to the determined similarity and the attribute tag of the candidate information in the pre-stored information base, an attribute tag corresponding to the information to be identified.

In some embodiments, the information to be recognized includes a text to be recognized, the first feature information includes feature words, and the receiving module 801 is specifically configured to:

and filtering the word units based on preset common words to obtain characteristic words, and arranging the characteristic words according to the position relation of the characteristic words in the text to be recognized to form a first characteristic information set of the text to be recognized.

In some embodiments, the information to be identified includes an image to be identified, the first feature information includes a gray scale value, and the receiving module 801 is specifically configured to:

dividing the gray level image according to the set rows and columns to obtain a plurality of gray level sub-images, and determining the gray level of each gray level sub-image;

and arranging the gray values of the gray sub-images according to the position information of each gray sub-image in the gray image to form a first characteristic information set.

In some embodiments, the extraction module 802 is specifically configured to:

traversing from any first characteristic information in the first characteristic information set, and searching whether characteristic information matched with the currently traversed first characteristic information exists in a pre-stored information base;

if the candidate information exists, extracting the candidate information where the existing characteristic information exists, and forming a candidate information set according to the extracted candidate information;

and if the traversal is finished, not finding the characteristic information matched with the first characteristic information in the prediction information base, and outputting prompt information for indicating that the candidate information is not found.

In some embodiments, the first determining module 803 is specifically configured to:

for each candidate information, based on the number of the first feature information and the number of the second feature information of the candidate information, selecting the information to be identified and the candidate information with less corresponding feature information as first comparison information, and selecting the information to be identified and the candidate information with more corresponding feature information as second comparison information;

for each feature information string, determining the similarity between the feature information string and a fourth feature information set according to the position of the first third feature information of the feature information string in the third feature information set, the position of the feature information matched with the first third feature information in the fourth feature information set, the position of the last third feature information in the third feature information set, the feature information number of the third feature information set and the feature information number of the fourth feature information set;

And selecting the maximum similarity as the similarity between the information to be identified and the candidate information from the similarities between each feature information string in the feature information string set and the fourth feature information set.

traversing in sequence from any third feature information in a third feature information set corresponding to the first comparison information;

if so, extracting continuous third feature information as a feature information string in a third feature information set, wherein the feature information matched with the continuous third feature information is contained in the fourth feature information set.

In some embodiments, the first determining module 803 determines the similarity between each feature information string in the feature information string set and the fourth feature information set according to the following formula:

wherein s is_iRepresenting the similarity between the ith characteristic information string in the characteristic information string set and the fourth characteristic information set; pos_B(i)The position of the first third characteristic information in the ith characteristic information string in the third characteristic information set is represented; pos_A(i)The position of the feature information matched with the first third feature information in the ith feature information string in the fourth feature information set is represented; end _point(i)The position of the last third characteristic information in the ith characteristic information string in the third characteristic information set is represented; l is_BNumber of third feature information, L, representing third feature information set_AFourth characteristic informationThe number of the fourth feature information of the set.

In some embodiments, the second determining module 804 is specifically configured to:

if so, sorting the first candidate information based on the similarity degree sequence of the first candidate information and the information to be identified; determining an attribute label corresponding to the information to be identified according to a preset attribute label configuration strategy, the sorted first candidate information and the attribute labels of the first candidate information;

An embodiment of the present application further provides a computer device 9, as shown in fig. 9, which is a schematic structural diagram of a computer device 900 provided in an embodiment of the present application, and includes: a processor 901, a memory 902, and a bus 903. The memory 902 stores machine-readable instructions executable by the processor 901 (for example, execution instructions corresponding to the receiving module 801, the extracting module 802, the first determining module 803, and the second determining module 804 in the information identification apparatus in fig. 8, and the like), when the computer device 900 is operated, the processor 901 and the memory 902 communicate through the bus 903, and when the machine-readable instructions are executed by the processor 901, the following processes are performed:

determining the similarity between the information to be identified and each candidate information in the candidate information set according to the number of first feature information in the first feature information set, the number of second feature information in the second feature information set, and the position information of each feature information in the feature information set with less feature information in the first feature information set and the second feature information set;

and determining an attribute label corresponding to the information to be identified according to the determined similarity and the attribute label of the candidate information in a pre-stored information base.

In a possible implementation manner, the information to be recognized includes a text to be recognized, the first feature information includes feature words, and the instructions executed by the processor 901 specifically include:

In a possible implementation manner, the information to be identified includes an image to be identified, the first feature information includes a gray scale value, and the instructions executed by the processor 901 specifically include:

In a possible implementation manner, the instructions executed by the processor 901 specifically include:

If so, extracting candidate information of the existing characteristic information;

and forming a candidate information set according to the extracted candidate information.

In a possible implementation manner, the instructions executed by the processor 901 further include:

and if the traversal is finished, the feature information matched with the first feature information is not found in the pre-stored information base, and prompt information used for indicating that the candidate information is not found is output.

for each candidate information, selecting information to be identified and the candidate information with less corresponding feature information as first comparison information and selecting information to be identified and the candidate information with more corresponding feature information as second comparison information based on the first feature information number and the second feature information number of the candidate information;

if so, extracting continuous third feature information as a feature information string in the third feature information set, wherein the feature information matched with the continuous third feature information is included in the fourth feature information set.

In one possible implementation, the processor 901 executes instructions to determine the similarity between each feature information string in the feature information string set and the fourth feature information set according to the following formula:

wherein s is_iRepresenting the similarity between the ith characteristic information string in the characteristic information string set and the fourth characteristic information set; pos_B(i)The position of the first third characteristic information in the ith characteristic information string in the third characteristic information set is represented; pos_A(i)The position of the feature information matched with the first third feature information in the ith feature information string in the fourth feature information set is represented; end_point(i)The position of the last third characteristic information in the ith characteristic information string in the third characteristic information set is represented; l is_BNumber of third feature information, L, representing third feature information set_AThe fourth feature information number of the fourth feature information set.

if the first candidate information exists, extracting the similarity of each first candidate information and the information to be identified, and sequencing the first candidate information;

And determining the attribute label corresponding to the information to be identified according to a preset attribute label configuration strategy, the sorted first candidate information and the attribute label of each first candidate information.

and in the candidate information set, if first candidate information with the similarity to the information to be identified being greater than a preset threshold does not exist, outputting prompt information for indicating that the first candidate information is not found.

The embodiment of the application also provides a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the information identification method are executed.

Specifically, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, and the like, and when a computer program on the storage medium is executed, the information identification method can be executed, so that the problem that the identification efficiency is low when a target user is identified based on similarity in the prior art is solved, and the effect of improving the identification efficiency of the attribute tag of the target object is achieved.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical, may be located in one place, or may be distributed on a plurality of networks. Some or all of them can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, the functions in the embodiments of the present application may be integrated into one process, may exist physically separately, or may be integrated into one process by two or more than two processes.

The functions, if implemented in the form of software functions and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An information identifying apparatus, comprising:

2. The information identification device according to claim 1, wherein the information to be identified includes a text to be identified, the first feature information includes a feature word, and the receiving module is specifically configured to:

3. The information recognition apparatus according to claim 1, wherein the information to be recognized includes an image to be recognized, and the first feature information includes a gradation value; the receiving module is specifically configured to:

4. The information identification device of claim 1, wherein the extraction module is specifically configured to:

5. The information identification device of claim 1, wherein the first determining module is specifically configured to:

6. The information identification device of claim 5, wherein the first determining module is specifically configured to:

7. The information identification device of claim 1, wherein the second determination module is specifically configured to:

8. An information identification method, comprising:

9. A computer device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when a computer device is running, the processor executing the machine-readable instructions to perform the steps of the information identification method according to claim 8.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the information identification method according to claim 8.