CN115757764A - Information identification method, device, equipment and computer readable storage medium - Google Patents

Information identification method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN115757764A
CN115757764A CN202110996527.4A CN202110996527A CN115757764A CN 115757764 A CN115757764 A CN 115757764A CN 202110996527 A CN202110996527 A CN 202110996527A CN 115757764 A CN115757764 A CN 115757764A
Authority
CN
China
Prior art keywords
link
information
data
link data
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110996527.4A
Other languages
Chinese (zh)
Inventor
孙祥训
程宝平
谢小燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Hangzhou Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202110996527.4A priority Critical patent/CN115757764A/en
Publication of CN115757764A publication Critical patent/CN115757764A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application discloses an information identification method, an information identification device, information identification equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring information to be processed, and separating the information to be processed to obtain text data and link data; respectively extracting the features of the text data and the link data to obtain corresponding text features and link features; determining similarity attribute information of the link data based on the link data and a pre-constructed link knowledge base; the identification result of the information to be processed is determined based on the text feature, the link feature and the similarity attribute information, and the identification result can be obtained by fusing the text data and the link data of the information to be processed after feature extraction is carried out on the text data and the link data based on an end-to-end thought, so that the identification process is simplified, and the universality and the identification accuracy of the identification method are improved.

Description

Information identification method, device, equipment and computer readable storage medium
Technical Field
The present application relates to the field of information processing, and relates to, but is not limited to, an information identification method, apparatus, device, and computer-readable storage medium.
Background
With the continuous popularization of mobile terminals, the mobile terminals have become an essential part of people's daily life, and when the mobile terminals bring high efficiency and convenience, some potential safety hazards also exist, for example, the short messages become important channels for receiving bank, payment software, cloud disk and other various platform information, and fraud short message imitation official organizations cheat money, payment passwords and other important privacy of users by sending the short messages containing fraud network links.
In the related technology, some training out a text multi-classification model by using known short message text feature vectors and fraud categories corresponding to the known short message text feature vectors, and identifying fraud short messages by using the model, the method only utilizes the text information features of the fraud short messages, and fraud short messages with very similar text information and official short messages are easy to miss, so that the identification accuracy of the fraud short messages is influenced; the other words after Word segmentation are converted into Word vectors by using Word2Vec, and Short message texts are subjected to feature extraction, the method only utilizes the text information features of fraud Short messages and does not utilize the network link features of the fraud Short messages, so that the recognition accuracy of the fraud Short messages is influenced, and the method uses a Long Short-Term Memory (LSTM) neural network to process the texts, has low speed and cannot be applied to fraud Short message recognition under the mass data scene of fifth Generation Mobile Communication technology (5G); besides the short message text, some related technologies also need to acquire characteristics of multiple dimensions including a sending side address, a sending base station, a sending frequency, a receiving side address and the like, the acquisition of the characteristics is difficult and costly, and network link characteristics in the fraud short message are not utilized, so that the problems of complexity, slow identification speed and low identification accuracy of the fraud short message identification process are caused.
Disclosure of Invention
In view of this, embodiments of the present application provide an information identification method, apparatus, device, and computer-readable storage medium.
The technical scheme of the embodiment of the application is realized as follows:
an embodiment of the present application provides an information identification method, including:
acquiring information to be processed, and separating the information to be processed to obtain text data and link data;
respectively extracting the characteristics of the text data and the link data to obtain corresponding text characteristics and link characteristics;
determining similarity attribute information of the link data based on the link data and a pre-constructed link knowledge base;
and determining the recognition result of the information to be processed based on the text feature, the link feature and the similarity attribute information.
An embodiment of the present application provides an information identification apparatus, which includes:
the acquisition module is used for acquiring information to be processed and separating the information to be processed to obtain text data and link data;
the feature extraction module is used for respectively extracting features of the text data and the link data to obtain corresponding text features and link features;
the first determining module is used for determining similarity attribute information of the link data based on the link data and a pre-constructed link knowledge base;
and the second determination module is used for determining the identification result of the information to be processed based on the text feature, the link feature and the similarity attribute information.
An embodiment of the present application provides an electronic device, which includes:
a processor; and
a memory for storing a computer program operable on the processor;
wherein the computer program realizes the above information recognition method when executed by a processor.
An embodiment of the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are configured to execute the information identification method.
The embodiment of the application provides an information identification method, an information identification device, information identification equipment and a computer readable storage medium, wherein the acquired information to be processed is separated to obtain text data and link data of the information to be processed; then, carrying out feature extraction on the text data to obtain text features, and carrying out feature extraction on the link data to obtain link features; then, determining similarity attribute information between the link data and a pre-constructed link knowledge base; and finally, determining the identification result of the information to be processed based on the text characteristic, the link characteristic and the similarity attribute information. In the identification process, not only the text data is subjected to feature extraction, but also the link data is subjected to feature extraction, and the similarity attribute information is determined, so that more comprehensive features aiming at the information to be processed are obtained; and finally, determining the recognition result of the information to be processed based on the text characteristics, the link characteristics and the similarity attribute information, thereby realizing rapid and efficient end-to-end recognition and improving the recognition accuracy based on more characteristics.
Drawings
In the drawings, which are not necessarily drawn to scale, like reference numerals may describe similar components in different views. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed herein.
Fig. 1 is a schematic flow chart of an implementation of an information identification method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of an implementation of a method for training a model according to an embodiment of the present disclosure;
fig. 3 is a schematic flow chart of an implementation of the feature extraction method according to the embodiment of the present application;
fig. 4 is a schematic flow chart of an implementation of a text feature extraction method according to an embodiment of the present application;
fig. 5 is a schematic flow chart of an implementation of the method for determining similarity attribute information according to the embodiment of the present application;
fig. 6 is a schematic flow chart of an implementation of each similarity value determination method provided in the embodiment of the present application;
fig. 7 is a schematic flowchart of another implementation of the information identification method according to the embodiment of the present application;
FIG. 8 is a flowchart illustrating an implementation of a method for determining a linked knowledge base according to an embodiment of the present application;
fig. 9 is a schematic flowchart of still another implementation of the information identification method according to the embodiment of the present application;
FIG. 10 is a schematic diagram of a recognition model architecture provided in an embodiment of the present application;
fig. 11 is a schematic structural diagram of an information identification apparatus according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a component of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
Based on the problems in the related art, the embodiments of the present application provide an information identification method, which can be implemented by a computer program, and when the computer program is executed, each step in the information identification method provided by the embodiments of the present application is completed. In some embodiments, the computer program may be executed by a processor in the electronic device. Fig. 1 is a flow of implementing an information identification method provided in an embodiment of the present application, where the method is applicable to an electronic device, where the electronic device may be a smart phone, a computer, an intelligent wearable device, and the like, and as shown in fig. 1, the information identification method includes:
step S101, obtaining information to be processed, and separating the information to be processed to obtain text data and link data.
Here, taking an electronic device as an example of a smart phone, the information to be processed may be information received by the smart phone based on a carried communication card, or information received by an instant messaging application of the smart phone based on a network communication link, and a source of the information to be processed is not limited in the embodiment of the present application. The information to be processed includes text data and link data, that is, the information to be processed includes both text content and link content.
In the embodiment of the application, the information to be processed can be separated through the keyword extraction algorithm, so that the text data and the link data in the information to be processed are obtained, and the separation of the information to be processed is realized.
And S102, respectively extracting characteristics of the text data and the link data to obtain corresponding text characteristics and link characteristics.
The text features can be obtained by extracting the features of the text data through a trained text feature extraction submodel, and the trained text feature extraction submodel can be an artificial intelligent model such as a neural network, a support vector machine, a genetic algorithm and the like. In implementation, the text features may be obtained by performing normalization, vectorization, convolution and pooling on the text data, where the convolution and pooling may be performed once or multiple times, and the number of times of performing the convolution and pooling may be determined according to the actual text data condition, which is not limited in the embodiment of the present application. For example, when feature extraction is performed on text data, feature extraction may be performed through a word-level convolutional neural network, the sizes of convolution kernels are 3, 5 and 7, respectively, and the number of convolution kernels may be 128.
The link data can be subjected to feature extraction through the trained link feature extraction submodel to obtain link features, and similarly, the trained link feature extraction submodel can be artificial intelligent models such as a neural network, a support vector machine and a genetic algorithm. In implementation, feature extraction on link data can be implemented by performing convolution and pooling on the link data, where the convolution and pooling may be performed once or multiple times, and the number of times of performing the convolution and pooling may be determined according to an actual link data condition, which is not limited in the embodiment of the present application. For example, when the link data feature extraction is performed, the feature extraction may be performed by a character set convolutional neural network, where the convolutional neural network has three sizes, that is, 3, 5, and 7, and the number may be 64.
And step S103, determining similarity attribute information of the link data based on the link data and a link knowledge base constructed in advance.
Here, the pre-constructed link knowledge base may include a plurality of normal link data and a plurality of abnormal link data, where the normal link data refers to compliant link data, and the abnormal link data refers to non-compliant link data or fraudulent link data, and if a link corresponding to the abnormal link data is clicked, security or property loss may be brought to a user.
In the embodiment of the application, the similarity processing can be performed on the link data and the link knowledge base through the trained similarity processing submodel, so that the similarity attribute information can be obtained. In actual implementation, whether the link data and the link knowledge base meet matching conditions is judged, under the condition that the link data and the link knowledge base meet the matching conditions, target reference link data meeting the matching conditions in the link knowledge base are determined, and first label information of the target reference link data is obtained; then, setting the similarity value between the link data and the link knowledge base as a preset value; and finally, determining the first label information and a preset value as similarity attribute information.
When the link data and the link knowledge base do not meet the matching conditions, that is, the target reference link data cannot be determined through the matching conditions, determining similarity values between the link data and each link in the link knowledge base, and determining the link data corresponding to the maximum similarity value as the target reference link data; then, second tag information of the target reference link data is acquired, and the maximum similarity value and the second tag information are determined as similarity attribute information.
And step S104, determining the identification result of the information to be processed based on the text feature, the link feature and the similarity attribute information.
Here, the text feature, the link feature, and the similarity attribute information may be input to a trained fusion feature submodel, and the text feature, the link feature, and the similarity attribute information are subjected to fusion processing, so as to obtain a recognition result of the information to be processed, where the trained fusion feature submodel may be an artificial intelligence model such as a neural network, a support vector machine, a genetic algorithm, and the like. In actual implementation, full connection processing can be performed on text features, link features and similarity attribute information, and distributed feature representations are mapped to a sample mark space, so that the purpose of classification is achieved, and full connection results are obtained; and then, carrying out normalization processing on the full connection result to obtain a result of whether the information to be processed is abnormal information, wherein the normalization processing can be realized by a softmax layer.
The embodiment of the application provides an information identification method, which comprises the steps of firstly, carrying out separation processing on acquired information to be processed to obtain text data and link data of the information to be processed; then, carrying out feature extraction on the text data to obtain text features, and carrying out feature extraction on the link data to obtain link features; then, determining similarity attribute information between the link data and a pre-constructed link knowledge base; and finally, determining the identification result of the information to be processed based on the text characteristic, the link characteristic and the similarity attribute information. In the identification process, not only the text data is subjected to feature extraction, but also the link data is subjected to feature extraction, and the similarity attribute information is determined, so that more comprehensive features aiming at the information to be processed are obtained; and finally, determining the recognition result of the information to be processed based on the text characteristics, the link characteristics and the similarity attribute information, thereby realizing rapid and efficient end-to-end recognition and improving the recognition accuracy based on more characteristics.
In some embodiments, before "obtaining the information to be processed, and performing separation processing on the information to be processed to obtain the text data and the link data" in step S101, the following steps may be further performed:
and S001, acquiring the trained recognition model.
The trained recognition model is used for recognizing text data and link data of information to be processed, and recognizing whether the information to be processed is normal information or abnormal information. The trained recognition model comprises a trained text feature extraction submodel, a trained link feature extraction submodel, a trained similarity processing submodel and a trained fusion feature submodel, wherein:
the trained text feature extraction submodel is used for extracting features of text data to obtain corresponding text features; the trained link characteristic extraction submodel is used for extracting the characteristics of link data to obtain corresponding link characteristics; the trained similarity processing submodel is used for carrying out similarity processing on the link data and the link knowledge base to obtain similarity attribute information of the link data; and the trained fusion characteristic submodel is used for carrying out fusion processing on the text characteristic, the link characteristic and the similarity attribute information so as to obtain the recognition result of the information to be processed.
In the embodiment of the present application, the trained recognition model may be an artificial intelligence model such as a neural network model, a support vector machine model, a bayesian network model, or the like. Taking the trained recognition model as a neural network model as an example, the trained text feature extraction submodel may be a word-level convolutional neural network, and the trained link feature extraction submodel may be a character-level convolutional neural network.
In other embodiments, the trained recognition model is obtained by training a preset recognition model, and as shown in fig. 2, the trained recognition model can be obtained through the following steps S0011 to S0014:
and S0011, acquiring a preset identification model, sample information and sample label information corresponding to the sample information.
Here, the preset recognition model may be an artificial intelligence model such as a neural network model, a support vector machine model, a bayesian network model, etc., the sample information is also the training information, and the label of the sample information is known and recorded as the sample label information.
And S0012, identifying the sample information by using a preset identification model to obtain a prediction identification result corresponding to the sample information.
Here, the sample information is input into a preset identification model, the sample information is identified by using the preset identification model, and the output of the preset identification model is used as a prediction identification result corresponding to the sample information.
And step S0013, obtaining error information between the sample label information and the prediction identification result.
For example, the sample label information may be that the probability that the sample information is abnormal information is 100%, and the probability that the predicted recognition result is that the sample information is abnormal information is 80%, then the error information is 20%.
And S0014, performing back propagation training on the preset recognition model based on the error information and the error threshold value to obtain the trained recognition model.
Here, the error threshold may be 10%, 5%, 3%, or the like, and the error information is compared with the error threshold, and when the error information is greater than the error threshold, the parameter weight in the preset recognition model is adjusted based on the error information until the error information is less than the error threshold, so as to obtain a trained recognition model for recognizing information.
Based on the composition structure of the trained recognition model, as shown in fig. 3, the step S102 "performs feature extraction on the text data and the link data respectively to obtain corresponding text features and link features" can be implemented by the following steps S1021 and S1022:
and S1021, extracting the features of the text data by using the trained text feature extraction submodel to obtain the text features.
In practical implementation, as shown in fig. 4, step S1021 can be implemented by steps S211 to S213 as follows:
step S211, performing text normalization processing on the text data to obtain processed text data.
Here, in order to achieve the purpose of a relatively accurate recognition result, it is necessary to ensure that the words of the information to be processed are normalized, and then, the text data may be normalized by using a normalization method such as dictionary mapping, text error correction, statistical machine translation, and the like, so as to obtain the text data with the words being normalized, that is, the processed text data.
Step S212, vectorization processing is carried out on the processed text data to obtain a text vector.
Here, the processed text data may be subjected to word segmentation processing based on word segmentation methods such as a maximum matching method, a minimum word segmentation method, a maximum probability method, and the like to obtain word segmented text data; then, vectorization processing can be realized through matrix-based distributed representation, cluster-based distributed representation, neural network-based distributed representation and other methods, and the text data after word segmentation is converted into vectors, that is, text vectors are obtained.
And step S213, performing convolution and pooling on the text vector to obtain text characteristics.
Here, the convolution processing on the text vector can be realized by a convolution layer, the convolution layer is composed of a plurality of convolution units, some low-level features such as edges, lines, angles and the like can be extracted through one convolution, and more complex features can be extracted from the low-level features through multiple convolution iterations; and then, inputting the more complex features obtained by convolution into a pooling layer for pooling, so as to realize formal downsampling of the more complex features and obtain text features. For example, the maximum pooling is to divide the input into a plurality of rectangular areas, output the maximum value to each area and form text features, so that the space size of data can be reduced, the number of data and the calculation amount are reduced, and overfitting can be controlled.
And step S1022, extracting the characteristics of the link data by using the trained link characteristic extraction submodel to obtain the link characteristics.
In actual implementation, step S1022 (not shown in the figure) can be implemented by the following steps S221 and S222:
step S221, performing convolution processing on the link data to obtain a convolution result.
Here, the convolution processing in step S221 is similar to the convolution processing in step S213, and then the implementation process of step S221 may refer to the convolution processing process in step S213. In implementation, the difference is that step S213 performs convolution processing on the text vector, step S221 performs convolution processing on the link data, and the number of convolution kernels and convolution units in step S213 and step S221 may be the same or different.
Step S222, performing pooling processing on the convolution result to obtain the link characteristics.
Here, the implementation process of step S222 is similar to the pooling process in step S213, and then the implementation process of step S222 may refer to the pooling process in step S213.
In the embodiment of the present application, through the above steps S1021 and S1022, text normalization, vectorization, convolution, and pooling are performed on the text data by using the trained text feature extraction submodel, so as to obtain text features; and performing convolution and pooling on the link data by using the trained link feature extraction submodel to obtain link features, thereby realizing feature extraction on the text data and the link data, realizing the feature acquisition of the information to be processed from multiple dimensions, and better preparing for subsequent identification.
Based on the above embodiment, the link knowledge base includes a plurality of reference link data, and when the step S103 "determining the similarity attribute information of the link data based on the link data and the link knowledge base constructed in advance" is implemented, the trained similarity processing submodel may be used to perform similarity processing on the link data and the link knowledge base to obtain the similarity attribute information, as shown in fig. 5, the step S31 to the step S38 may be implemented to "perform similarity processing on the link data and the link knowledge base by using the trained similarity processing submodel to obtain the similarity attribute information":
step S31, determining whether the link knowledge base includes target reference link data satisfying the matching condition with the link data.
Here, the link knowledge base includes reference link data that is the same as the link data in the number of characters and the same position, and the reference link data is written as target reference link data. When the link knowledge base is judged to include the target reference link data meeting the matching condition with the link data, the step S32 is performed, that is, the first tag data of the target reference link data is obtained; if it is determined that the link knowledge base does not include the target reference link data satisfying the matching condition with the link data, the process proceeds to step S35, that is, each similarity value between the link data and each reference link data in the link knowledge base is obtained.
In step S32, first tag information of the target reference link data is acquired.
Here, the first tag information of the target reference link data is used to characterize whether the target reference link data is normal link data or abnormal link data, since whether each reference link data in the link knowledge base is normal or not is known, that is, for any reference link data in the link data knowledge base, the tag information of the reference link data is determined, and is either normal link data or abnormal link data. Then, in the embodiment of the present application, the first tag information of the target reference link data may be obtained by reading the instruction.
In step S33, the similarity value between the link data and the link knowledge base is set to a preset value.
Here, the preset value may be equal to 0.5, 1, or 2, and the preset value may be a default value or a custom setting value, and at this time, the similarity value between the link data and the link knowledge base is the preset value.
Step S34, determining the preset value and the first label information as the similarity attribute information.
Here, the preset value and the first tag information may be subjected to stitching processing, and information obtained after the stitching is used as similarity attribute information of the link data.
In step S35, respective similarity values between the link data and respective reference link data of the link knowledge base are determined.
At this time, if the link knowledge base does not include the target link data satisfying the matching condition with the link data, each similarity value between the link data and each reference link data in the link knowledge base is determined, and in the process of determining each similarity value, as shown in fig. 6, step S35 may be implemented by the following steps S351 to S353:
in step S351, a first data length of the link data and each second data length of each reference link data are acquired.
Here, the data length refers to the number of characters in the link data, and a first data length of the link data may be obtained by the character length reading instruction, and a second data length of each reference link data in the link knowledge base may also be obtained by the character length reading instruction.
In step S352, respective edit distances between the link data and the respective reference link data are determined.
Here, the editing distance refers to the minimum number of editing operations required to change from one string to another string. The allowed editing operations include replacing one character with another, inserting one character, and deleting one character.
In the embodiment of the application, the editing distances between the link data and the reference link data are determined uniformly according to a minimum processing method.
Step S353 determines each similarity value based on the first data length, each second data length, and each edit distance.
Here, each edit distance is negatively correlated with each similarity value, that is, the smaller the edit distance, the more similar the representation link data is to the reference link data, and the larger the similarity value is.
Here, taking the determination of the similarity value of the link data and one reference link data as an example, the larger value and the smaller value of the first data length and the second data length corresponding to the reference similarity data may be determined by size comparison; then, determining the difference value of the editing distance corresponding to the larger value and the reference link data; finally, the ratio of the difference to the smaller value is determined as the similarity value of the link data and the reference link data.
By analogy, each similarity value between the link data and each reference link data in the link knowledge base can be determined according to the method.
And S36, determining the maximum similarity value in all the similarity values, and determining the reference link data corresponding to the maximum similarity value as the target reference link data.
Here, the maximum similarity value with the largest similarity value among the similarity values may be determined by pairwise comparison, and the reference link data corresponding to the maximum similarity value may be determined as the target reference link data.
In step S37, second tag information of the target reference link data is acquired.
Here, the implementation process of step S37 is similar to the implementation process of step S32, and therefore, the implementation process of step S37 may refer to the implementation process of step S32.
And step S38, determining the maximum similarity value and the second label information as similarity attribute information.
Here, the implementation process of step S38 is similar to that of step S34, and therefore, the implementation process of step S38 may refer to that of step S34.
In the embodiment of the present application, through the steps S31 to S38, in a case that it is determined that the link knowledge base includes the target reference link data satisfying the matching condition with the link data, the first tag information of the target reference link data is obtained, and a similarity value between the link data and the link knowledge base is set to a preset value; then, determining a preset value and the first label information as similarity attribute information; under the condition that the link knowledge base does not contain target reference link data meeting the matching conditions with the link data, acquiring a first data length of the link data and a second data length of each reference link data, and then determining each editing distance between the link data and each reference link data; then, determining respective similarity values between the link data and the respective reference link data based on the first data length, the second data length, and the respective edit distances; then, determining the maximum similarity value from all the similarity values, and determining the reference link data corresponding to the maximum similarity value as target reference link data; and finally, acquiring second label information of the target reference link data, and determining the maximum similarity value and the second label information as similarity attribute information. Thereby determining link attribute information of the link data.
Based on the above embodiment, in the step S104, "determining the recognition result of the information to be processed based on the text feature, the link feature and the similarity attribute information" may be implemented by performing fusion processing on the text feature, the link feature and the similarity attribute information by using a trained fusion feature sub-model, so as to obtain the recognition result of the information to be processed. Accordingly, the "fusion processing is performed on the text feature, the link feature and the similarity attribute information by using the trained fusion feature sub-model to obtain the recognition result of the information to be processed" can be realized by the following steps S41 and S42 (not shown in the figure):
and S41, performing full connection processing on the text characteristics, the link characteristics and the similarity attribute information to obtain a full connection result.
In the embodiment of the application, taking the fully-connected layer of the convolutional neural network as an example, text features, link features and similarity attribute information can be input into the fully-connected layer to realize fully-connected processing, the text features, the link features and the similarity attribute information are integrated and processed, and the distributed feature representation is mapped to a sample mark space through an excitation function of each neuron to realize the purpose of classification. In practical implementations, the fully-connected layer may be a plurality of layers.
And S42, carrying out normalization processing on the full connection result to obtain an identification result of the information to be processed.
Here, the output value of the last full-link layer is transmitted to the normalization output layer, and the normalization processing is performed on the output value of the full-link layer, where the normalization processing is equivalent to classification processing, so as to obtain an identification result of the information to be processed, and the result can be represented in a probability form.
In the embodiment of the present application, the text feature, the link feature, and the similarity attribute information are subjected to full connection and normalization processing in the above steps S41 and S42, so that the recognition result of the information to be processed is obtained.
In some embodiments, the recognition result includes an abnormal probability that the information to be processed is abnormal information, and as shown in fig. 7, after step S104, the following steps S105 to S109 may also be performed, which are described below with reference to fig. 7.
Step S105, judging whether the abnormal probability is larger than the probability threshold value.
Here, the probability threshold may be 0.8, 0.85, 0.9, etc., and the probability threshold may be a default value or a custom setting value. If the abnormal probability is judged to be greater than the probability threshold value, representing that the information to be processed is abnormal information, and entering step S106; if the abnormal probability is judged to be less than or equal to the probability threshold, the information to be processed is represented to be not abnormal information, no alarm information needs to be output, the step S109 is entered, and the information to be processed is normally displayed.
And step S106, determining the information to be processed as abnormal information.
At this time, the anomaly probability is greater than the probability threshold, and the information to be processed is determined as the anomaly information.
And step S107, responding to the information to be processed as abnormal information, and determining an alarm message.
Here, the presentation form of the warning message may be a vibration, a voice, a prompt box, or other presentation forms, and the embodiment of the present application is not limited. Exemplarily, the control electronics may be determined to vibrate for 5 seconds as the warning message.
And step S108, outputting an abnormal alarm message.
Here, depending on the above distance, the electronic device may be controlled to vibrate for 5 seconds, thereby outputting the warning message in a vibrating manner.
Step S109, displays the information to be processed.
At this time, the abnormal probability is less than or equal to the probability threshold value, the information to be processed is represented as normal information, and the information to be processed is displayed without determining an alarm message.
Through the steps S105 to S109, when the anomaly probability is greater than the probability threshold, determining that the information to be processed is anomalous information; then, determining abnormal alarm messages such as vibration, voice, prompt box and the like; and finally, outputting the abnormal alarm message, thereby achieving the purpose of prompting and enriching the functions of the electronic equipment. And under the condition that the abnormal probability is smaller than or equal to the probability threshold, determining the information to be processed as normal information, and displaying the information to be processed.
In some embodiments, the link knowledge base needs to be constructed in advance before step S103, and in actual implementation, the link knowledge base can be constructed through steps S801 to S803 shown in fig. 8, which are described below with reference to fig. 8.
In step S801, a first preset number of normal link data and a second preset number of abnormal link data are obtained.
Here, the first preset number may be 1 ten thousand, 1.5 ten thousand, 2 ten thousand, etc., the second preset number may be 5 thousand, 5.5 thousand, 3 ten thousand, etc., the first preset number and the second preset number may be default values or custom values, and the first preset number and the second preset number may be equal or unequal, which is not limited in the embodiment of the present application.
In the embodiment of the application, the normal link data and the abnormal link data can be acquired from a public network.
Step S802, a link knowledge base is constructed based on the normal link data and the abnormal link data.
Here, "normal" tag information may be added to the normal link data, and "abnormal" tag information may be added to the abnormal link data, whereby the normal link data after adding the tag information and the abnormal link data after adding the tag information constitute the link knowledge base.
Step S803, when the preset interval duration is reached, acquiring the updated normal link data and the updated abnormal link data, updating the link knowledge base based on the updated normal link data and the updated abnormal link data, and constructing the updated link knowledge base.
Here, the preset interval duration may be 1 week, 2 weeks, 3 weeks, etc., and considering that the link data is updated, at every preset interval duration, the electronic device acquires the updated normal link data and the updated abnormal link data again from the public network, adds "normal" tag information to the updated normal link data, adds "abnormal" tag information to the updated abnormal link data, and finally forms an updated link knowledge base by the updated normal link data to which the tag information is added and the updated abnormal link data to which the tag information is added.
In the embodiment of the present application, through the above steps S801 to S803, the link knowledge base is configured by the abnormal link data and the normal link data, and the link knowledge base is periodically updated, so that the accuracy of the link knowledge base is ensured.
Based on the foregoing embodiment, an information identification method is further provided in the embodiments of the present application, and in the embodiments of the present application, information to be processed is taken as a short message for explanation. Whether the short message is a fraud short message (corresponding to abnormal information in other embodiments) can be identified by using the information identification method provided by the embodiment of the present application, fig. 9 is an implementation flow of the information identification method provided by the embodiment of the present application, as shown in fig. 9, the information identification method includes:
step S901, a short message to be processed is acquired.
Here, the short message to be processed includes text data and link data, and the short message to be processed may be information received by the terminal based on a telephone number, or may be information received by an instant messaging application program of the terminal based on a network communication link. In addition, the short message to be processed can also be information input by the user, and the source of the short message to be processed is not limited in the embodiment of the application.
And step S902, preprocessing.
Here, the short message to be processed is preprocessed, where the preprocessing may be separation processing, and the preprocessing is to separate text data and link data in the short message to be processed, so as to obtain text data 9021 and link data 9022 respectively. In actual implementation, the short message to be processed including the network link is read first, and then the short message text and the network link therein are extracted respectively, wherein the short message text is also text data, and the network link is also link data. After the network link is acquired, if the network link is a short link, the short link needs to be converted into a link with a normal length.
In step S903, the link knowledge base 9032 is acquired from the public network 9031.
Here, the crawler may be used to crawl from the public network normal links 9033 including links to multiple categories of websites such as operators, banks, e-commerce, payment software, cell phone vendors, social networking websites, game vendors, cloud disk vendors, securities, financial, video websites, travel websites, and so on. In this embodiment of the application, the number of normal links included in the link knowledge base may be 1 ten thousand, the link knowledge base may also be updated periodically according to the website popularity, and the abnormal link 9034 in the link knowledge base may be composed of the found abnormal link and a part of websites in a phishing (phishtank) website.
Step S904, compare the network link with the link knowledge base.
Here, the domain name part of the network link in the short message to be processed is obtained in the extraction step S902, the domain name part of the network link is compared with the normal link and the abnormal link in the link knowledge base, if the domain name part of the network link is hit, that is, the link knowledge base contains the link which is the same as the domain name of the network link, and at this time, the result 9041 corresponding to the link attribute information is directly output; if the result is not hit, that is, the domain name part of the network link is not the same as the link in the link repository, the process proceeds to step S905, and the similarity attribute information between the network link and the link repository is continuously determined.
Step S905 determines the similarity attribute information.
Here, the embodiment of the present application provides an improved similarity calculation formula, as shown in formula (1), the improved similarity calculation formula is determined based on the edit distance, the network link character length, and the link character length in the link knowledge base, in step S905, the similarity between the network link domain name of the short message to be processed extracted in step S904 and each link in the link knowledge base is calculated using the following formula (1), and the maximum value is output, and the maximum value is determined as the similarity attribute information. The similarity is recorded as S, and the calculation formula of the similarity S is as follows:
Figure BDA0003234308920000171
in the formula (1), url1 refers to a network link of a short message to be processed; url2 refers to a network link in the link knowledge base, and since the link knowledge base comprises a plurality of links, url2 is a set; len (url 1) refers to the character length of the network link of the short message to be processed, i.e. the number of characters contained in the network link; len (url 2) refers to the character length of each link in the link knowledge base; lev (url 1, url 2) refers to the edit distance between the web link and the link repository; the range of S is [0,1], url1 and url2 are respectively a normal link and a network link of the short message to be processed in the knowledge base, and when the links are completely the same or mutually contained, S is 1; when S is not 0 at all.
In this embodiment of the present application, the similarity attribute information is determined by the trained similarity attribute information determining module 1002 in fig. 10, where the trained similarity attribute information determining module may also be marked as a trained similarity processing sub-model.
And step S906, extracting the characteristics of the network link.
Here, the network link obtained in step S902 is input to a trained link feature extraction network 1001 shown in fig. 10, which may also be referred to as a trained link feature extraction submodel, to extract deep-level feature expression.
In the embodiment of the present application, the link feature extraction network performs feature extraction using a character-level convolutional neural network, for example, the sizes of convolutional kernels of the convolutional neural network are three, 3, 5, and 7, and the number of the convolutional kernels is 64, so as to obtain link feature vectors.
Step S907, feature extraction is performed on the text data.
Here, the text data of the short message to be processed in step S902 is normalized, word segmented, word vector converted, and the like, and then the text of the short message is extracted by using the trained text feature extraction network 1003 in fig. 10, where the trained text feature extraction network may also be recorded as a trained text feature extraction submodel.
In the embodiment of the present application, the text feature extraction network performs feature extraction by using a word-level convolutional neural network, for example, the sizes of convolution kernels are 3, 5, and 7, respectively, and the number of convolution kernels is 128, so as to obtain text feature vectors.
Step S908, performing fusion processing on the link feature vector, the similarity attribute information, and the text feature vector to obtain a processing result of the short message to be processed.
Here, if the domain name part of the network link of the short message to be processed is identical to the link in the link knowledge base, the output in step S904, step S906, step S907 is input as the trained converged feature network 1004, the last layer of the trained converged network model is a normalization layer, which may be, for example, a softmax layer, and outputs the probability that it is a fraud short message, and if it is greater than the threshold value, it is a fraud short message; and if the domain name part of the short message network link to be processed is not the same as the link in the link knowledge base, outputting the output in the steps S905, S906 and S907 as the input of a converged network model, wherein the last layer of the converged network model is a softmax layer, outputting the probability that the short message network link is a fraud short message, and if the probability is greater than a threshold value, outputting the fraud short message.
In the embodiment of the application, the trained fusion feature network can be recorded as a trained fusion feature sub-model, the trained fusion feature network splices the short message link feature vector, the similarity feature and the text feature vector, the fraud short message multimode features are fused through two layers of fully-connected neural networks, and the probability value that the short message is a fraud short message is obtained through mapping by a softmax function.
In the embodiment of the application, through the steps S901 to S908, the to-be-processed short message identification method adopts an end-to-end identification model, and identifies the to-be-processed short message by using the text feature vector, the link feature vector, and the similarity attribute information between the network link and the link knowledge base of the to-be-processed short message, so that the method can not only imitate the fraud short message of the official text by the text, but also identify the fraud short message imitating the official link by the link, and has stronger universality and higher accuracy. The link feature extraction network and the text feature extraction network both use convolution neural networks with convolution kernels of different sizes to extract deep feature expression, and compared with time sequence related neural networks such as LSTM (least Square), pre-training models (Bert), the speed is higher, the method is suitable for large-scale fraud short message data processing, and meets the requirements of 5G service scenes better.
Based on the foregoing embodiments, the embodiments of the present application provide an information identification apparatus, where each module included in the apparatus and each unit included in each module may be implemented by a processor in a computer device; of course, the implementation can also be realized through a specific logic circuit; in implementation, the processor may be a Central Processing Unit (CPU), a Microprocessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.
Fig. 11 is a schematic view of a structure of the information recognition apparatus according to the embodiment of the present application, and as shown in fig. 11, the information recognition apparatus 1100 includes:
an obtaining module 1101, configured to obtain information to be processed, and separate the information to be processed to obtain text data and link data;
a feature extraction module 1102, configured to perform feature extraction on the text data and the link data respectively to obtain corresponding text features and link features;
a first determining module 1103, configured to determine similarity attribute information of the link data based on the link data and a pre-constructed link knowledge base;
a second determining module 1104, configured to determine an identification result of the to-be-processed information based on the text feature, the link feature, and the similarity attribute information.
In some embodiments, the obtaining module 1101 is further configured to obtain a trained recognition model, where the trained recognition model includes a trained text feature extraction submodel, a trained link feature extraction submodel, a trained similarity processing submodel, and a trained fusion feature submodel;
the feature extraction module 1102 includes:
the first extraction submodule is used for extracting the characteristics of the text data by utilizing the trained text characteristic extraction submodel to obtain the text characteristics;
the second extraction submodule is used for extracting the characteristics of the link data by utilizing the trained link characteristic extraction submodel to obtain the link characteristics;
the first determining module 1103 includes:
the similarity processing submodule is used for carrying out similarity processing on the link data and the link knowledge base by utilizing the trained similarity processing submodel to obtain similarity attribute information;
the second determining module 1104 includes:
and the fusion processing submodule is used for performing fusion processing on the text features, the link features and the similarity attribute information by using the trained fusion feature submodel to obtain an identification result of the information to be processed.
In some embodiments, the first feature extraction sub-module comprises:
the normalization unit is used for performing text normalization processing on the text data to obtain processed text data;
the vectorization unit is used for vectorizing the processed text data to obtain a text vector;
and the convolution pooling unit is used for performing convolution and pooling on the text vector to obtain the text features.
In some embodiments, the second feature extraction sub-module comprises:
the convolution unit is used for performing convolution processing on the link data to obtain a convolution result;
and the pooling unit is used for pooling the convolution result to obtain the link characteristic.
In some embodiments, the similarity processing submodule includes:
the first acquisition unit is used for acquiring first label information of target reference link data when the link knowledge base comprises the target reference link data meeting the matching condition with the link data;
the setting unit is used for setting the similarity value between the link data and the link knowledge base as a preset value;
a first determining unit, configured to determine the preset value and the first tag information as the similarity attribute information.
In some embodiments, the similarity processing sub-module further comprises:
a second determining unit, configured to determine, when target link data satisfying a matching condition with the link data is not included in the link repository, respective similarity values between the link data and respective reference link data of the link repository;
a third determining unit, configured to determine a maximum similarity value among the similarity values, and determine reference link data corresponding to the maximum similarity value as the target reference link data;
a second obtaining unit configured to obtain second tag information of the target reference link data;
a fourth determining unit, configured to determine the maximum similarity value and the second tag information as the similarity attribute information.
In some embodiments, the second determination unit comprises:
an obtaining subunit, configured to obtain a first data length of the link data and each second data length of each reference link data;
a first determining subunit configured to determine respective edit distances between the link data and the respective reference link data;
a second determining subunit, configured to determine the respective similarity values based on the first data length, the respective second data lengths, and the respective edit distances, where the respective edit distances are negatively correlated with the respective similarity values.
In some embodiments, the merging into processing sub-module comprises:
the full-connection processing unit is used for performing full-connection processing on the text features, the link features and the similarity attribute information to obtain a full-connection result;
and the normalization processing unit is used for performing normalization processing on the full connection result to obtain an identification result of the information to be processed.
In some embodiments, the identification result includes an abnormal probability that the result is abnormal information, and the information identification apparatus 1100 further includes:
the third determining module is used for determining that the information to be processed is abnormal information when the abnormal probability is greater than a probability threshold value;
the response module is used for responding to the information to be processed as abnormal information and determining an abnormal alarm message;
and the output module is used for outputting the abnormal alarm message.
In some embodiments, the obtaining module 1101 is further configured to obtain a first preset number of normal link data and a second preset number of abnormal link data; the information recognition apparatus 1100 further includes:
the construction module is used for constructing the link knowledge base based on the normal link data and the abnormal link data;
and the updating module is used for acquiring the updated normal link data and the updated abnormal link data when the preset interval duration is reached, updating the link knowledge base based on the updated normal link data and the updated abnormal link data, and constructing the updated link knowledge base.
In some embodiments, the obtaining module 1101 is further configured to obtain a preset identification model, sample information, and sample label information corresponding to the sample information; acquiring error information between the sample label information and the prediction identification result; the information recognition apparatus 1100 further includes:
the identification module is used for identifying the sample information by using the preset identification model to obtain a prediction identification result corresponding to the sample information;
and the training module is used for carrying out back propagation training on the preset recognition model based on the error information and the error threshold value to obtain the trained recognition model.
It should be noted that the description of the information identification apparatus in the embodiment of the present application is similar to the description of the method embodiment, and has similar beneficial effects to the method embodiment, and therefore, the description is omitted here for brevity. For technical details not disclosed in the embodiments of the apparatus, reference is made to the description of the embodiments of the method of the present application for understanding.
It should be noted that, in the embodiment of the present application, if the information identification method is implemented in the form of a software functional module and sold or used as a standalone product, the information identification method may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the related art may be embodied in the form of a software product stored in a storage medium, and including several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
Accordingly, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps in the information identification method provided in the above embodiments.
An electronic device according to an embodiment of the present application is provided, and fig. 12 is a schematic view of a structure of the electronic device according to the embodiment of the present application, and as shown in fig. 12, the electronic device 1200 includes: a processor 1201, at least one communication bus 1202, a user interface 1203, at least one external communication interface 1204 and a memory 1205. Wherein the communication bus 1202 is configured to enable connective communication between such components. The user interface 1203 may include a display screen, and the external communication interface 1204 may include a standard wired interface and a wireless interface, among others. Wherein the processor 1201 is configured to execute the program of the information identification method stored in the memory to realize the steps in the information identification method provided in the above-mentioned embodiments.
The above description of the electronic device and storage medium embodiments, similar to the description of the method embodiments above, has similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the electronic device and the storage medium of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the advantages and disadvantages of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present application.
In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps of implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer-readable storage medium, and when executed, executes the steps including the method embodiments; and the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an AC to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall cover the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (14)

1. An information identification method, characterized in that the method comprises:
acquiring information to be processed, and performing separation processing on the information to be processed to obtain text data and link data;
respectively extracting the features of the text data and the link data to obtain corresponding text features and link features;
determining similarity attribute information of the link data based on the link data and a pre-constructed link knowledge base;
and determining the recognition result of the information to be processed based on the text feature, the link feature and the similarity attribute information.
2. The method of claim 1, wherein before the obtaining the information to be processed and the separating the information to be processed to obtain the text data and the link data, the method further comprises:
acquiring a trained recognition model, wherein the trained recognition model comprises a trained text feature extraction submodel, a trained link feature extraction submodel, a trained similarity processing submodel and a trained fusion feature submodel;
the respectively extracting the features of the text data and the link data to obtain corresponding text features and link features comprises: performing feature extraction on the text data by using the trained text feature extraction submodel to obtain the text features; extracting the characteristics of the link data by using the trained link characteristic extraction submodel to obtain the link characteristics;
the determining of the similarity attribute information of the link data based on the link data and a pre-constructed link knowledge base comprises the following steps: carrying out similarity processing on the link data and the link knowledge base by using the trained similarity processing submodel to obtain similarity attribute information;
the determining the recognition result of the information to be processed based on the text feature, the link feature and the similarity attribute information includes: and performing fusion processing on the text features, the link features and the similarity attribute information by using the trained fusion feature submodel to obtain an identification result of the information to be processed.
3. The method of claim 2, wherein the extracting the features of the text data by using the trained text feature extraction submodel to obtain the text features comprises:
performing text normalization processing on the text data to obtain processed text data;
vectorizing the processed text data to obtain a text vector;
and performing convolution and pooling on the text vector to obtain the text features.
4. The method of claim 2, wherein the extracting the features of the link data by using the trained link feature extraction submodel to obtain the link features comprises:
performing convolution processing on the link data to obtain a convolution result;
and performing pooling treatment on the convolution result to obtain the link characteristic.
5. The method of claim 2, wherein the link knowledge base includes a plurality of reference link data, and the performing similarity processing on the link data and the link knowledge base by using the trained similarity processing submodel to obtain the similarity attribute information includes:
when determining that the link knowledge base comprises target reference link data meeting matching conditions with the link data, acquiring first label information of the target reference link data;
setting the similarity value between the link data and the link knowledge base as a preset value;
and determining the preset value and the first label information as the similarity attribute information.
6. The method of claim 5, wherein the similarity processing is performed on the link data and the link knowledge base by using the trained similarity processing submodel to obtain the similarity attribute information, further comprising:
determining each similarity value between the link data and each reference link data of the link knowledge base when determining that the link knowledge base does not include target link data meeting matching conditions with the link data;
determining the maximum similarity value in the similarity values, and determining the reference link data corresponding to the maximum similarity value as the target reference link data;
acquiring second label information of the target reference link data;
determining the maximum similarity value and the second tag information as the similarity attribute information.
7. The method of claim 6, wherein determining respective similarity values between the link data and respective reference link data of the link repository comprises:
acquiring a first data length of the link data and each second data length of each reference link data;
determining respective edit distances between the link data and the respective reference link data;
determining the respective similarity values based on the first data length, the respective second data lengths, and the respective edit distances, wherein the respective edit distances are inversely related to the respective similarity values.
8. The method according to claim 2, wherein the obtaining the recognition result of the information to be processed by fusing the text feature, the link feature and the similarity attribute information by using the trained fusion feature submodel comprises:
performing full-connection processing on the text features, the link features and the similarity attribute information to obtain a full-connection result;
and carrying out normalization processing on the full connection result to obtain an identification result of the information to be processed.
9. The method of claim 1, wherein the recognition result comprises an anomaly probability that the result is anomalous information, and wherein the method further comprises:
when the abnormal probability is determined to be larger than a probability threshold value, determining the information to be processed as abnormal information;
determining an abnormal alarm message in response to the information to be processed being abnormal information;
and outputting the abnormal alarm message.
10. The method according to any one of claims 1 to 9, further comprising:
acquiring a first preset amount of normal link data and a second preset amount of abnormal link data;
constructing the link knowledge base based on the normal link data and the abnormal link data;
and when the preset interval duration is reached, acquiring updated normal link data and updated abnormal link data, updating the link knowledge base based on the updated normal link data and the updated abnormal link data, and constructing an updated link knowledge base.
11. The method according to any one of claims 1 to 9, further comprising:
acquiring a preset identification model, sample information and sample label information corresponding to the sample information;
identifying the sample information by using the preset identification model to obtain a prediction identification result corresponding to the sample information;
acquiring error information between the sample label information and the prediction identification result;
and carrying out back propagation training on the preset recognition model based on the error information and the error threshold value to obtain the trained recognition model.
12. An information recognition apparatus, characterized in that the information recognition apparatus comprises:
the acquisition module is used for acquiring information to be processed and separating the information to be processed to obtain text data and link data;
the feature extraction module is used for respectively extracting features of the text data and the link data to obtain corresponding text features and link features;
the first determination module is used for determining similarity attribute information of the link data based on the link data and a pre-constructed link knowledge base;
and the second determination module is used for determining the identification result of the information to be processed based on the text feature, the link feature and the similarity attribute information.
13. An electronic device, characterized in that the electronic device comprises:
a processor; and
a memory for storing a computer program operable on the processor;
wherein the computer program, when executed by a processor, implements the information identification method of any one of claims 1 to 11.
14. A computer-readable storage medium having computer-executable instructions stored therein, the computer-executable instructions being configured to perform the information identification method of any one of claims 1 to 11.
CN202110996527.4A 2021-08-27 2021-08-27 Information identification method, device, equipment and computer readable storage medium Pending CN115757764A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110996527.4A CN115757764A (en) 2021-08-27 2021-08-27 Information identification method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110996527.4A CN115757764A (en) 2021-08-27 2021-08-27 Information identification method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115757764A true CN115757764A (en) 2023-03-07

Family

ID=85331933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110996527.4A Pending CN115757764A (en) 2021-08-27 2021-08-27 Information identification method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115757764A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116991874A (en) * 2023-09-26 2023-11-03 海信集团控股股份有限公司 Text error correction and large model-based SQL sentence generation method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116991874A (en) * 2023-09-26 2023-11-03 海信集团控股股份有限公司 Text error correction and large model-based SQL sentence generation method and device
CN116991874B (en) * 2023-09-26 2024-03-01 海信集团控股股份有限公司 Text error correction and large model-based SQL sentence generation method and device

Similar Documents

Publication Publication Date Title
CN107491534A (en) Information processing method and device
CN107333071A (en) Video processing method and device, electronic equipment and storage medium
CN111061874A (en) Sensitive information detection method and device
CN108319888B (en) Video type identification method and device and computer terminal
CN110781407A (en) User label generation method and device and computer readable storage medium
CN111177367B (en) Case classification method, classification model training method and related products
CN114330966A (en) Risk prediction method, device, equipment and readable storage medium
CN104915399A (en) Recommended data processing method based on news headline and recommended data processing method system based on news headline
CN114386410A (en) Training method and text processing method of pre-training model
CN115757991A (en) Webpage identification method and device, electronic equipment and storage medium
CN113641797A (en) Data processing method, device, equipment, storage medium and computer program product
CN117558270B (en) Voice recognition method and device and keyword detection model training method and device
CN111460783A (en) Data processing method and device, computer equipment and storage medium
CN108090044B (en) Contact information identification method and device
CN115757764A (en) Information identification method, device, equipment and computer readable storage medium
CN112905787B (en) Text information processing method, short message processing method, electronic device and readable medium
CN112417874A (en) Named entity recognition method and device, storage medium and electronic device
CN114398973B (en) Media content tag identification method, device, equipment and storage medium
CN111477212A (en) Content recognition, model training and data processing method, system and equipment
CN113011875B (en) Text processing method, text processing device, computer equipment and storage medium
CN114492584A (en) Automatic content grading method for android Chinese application market
CN114359811A (en) Data authentication method and device, electronic equipment and storage medium
CN110163761B (en) Suspicious item member identification method and device based on image processing
CN112199954A (en) Disease entity matching method and device based on voice semantics and computer equipment
CN116911304B (en) Text recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination