CN111259216B

CN111259216B - Information identification method, device and equipment

Info

Publication number: CN111259216B
Application number: CN201811466744.7A
Authority: CN
Inventors: 陈笠鸥; 赵向军
Original assignee: TCL Technology Group Co Ltd
Current assignee: TCL Technology Group Co Ltd
Priority date: 2018-12-03
Filing date: 2018-12-03
Publication date: 2024-05-24
Anticipated expiration: 2038-12-03
Also published as: CN111259216A

Abstract

An information identification method includes: filtering the received information through preset filtering characteristics; acquiring scene data of the filtered information, substituting the scene data into a pre-trained neural network model, and outputting a label of the information; and generating reminding information according to the filtering result and/or the label of the information output by the neural network model. The method can be used for globally and effectively identifying the fraud information, and the accuracy of identifying the fraud information is improved.

Description

Information identification method, device and equipment

Technical Field

The present application belongs to the field of information processing, and in particular, relates to an information identification method, apparatus and device.

Background

Along with the development of the mobile internet, people-to-people communication becomes more and more convenient, besides traditional face-to-face communication, intelligent equipment can be conveniently used for sending text, pictures, voice or video and other contents, so that people can not be limited by distance when communicating.

As communication technologies develop, smart mobile terminals have become popular, and more elderly people have begun to use mobile smart terminals, such as smartphones or smartphones. Because the old person is not strong to the prevention consciousness of risk, the lawbreaker can send the information of fraud to the intelligent mobile terminal that old person used for old person causes mental damage, property operation, even causes harm to health and life because of the information content is fraudulent in light of belief. If the information is identified with only fraud keywords, it may not be possible to identify the fraud information globally valid.

Disclosure of Invention

In view of the above, the embodiments of the present application provide an information identification method, apparatus and device, so as to solve the problem that in the prior art, fraud information cannot be identified globally and effectively.

A first aspect of an embodiment of the present application provides an information identifying method, including:

Filtering the received information through preset filtering characteristics;

Acquiring scene data of the filtered information, substituting the scene data into a pre-trained neural network model, and outputting a label of the information;

and generating reminding information according to the filtering result and/or the label of the information output by the neural network model.

With reference to the first aspect, in a first possible implementation manner of the first aspect, the step of filtering the received information through a preset filtering feature includes one or more of:

if the sending number of the detected information is a number in a preset number blacklist, filtering the information;

If the information is detected to comprise the website in the preset website blacklist, filtering the information;

Calculating the similarity between the text content in the information and the information in a preset false information base, and filtering out the information if the similarity is greater than a preset threshold value;

Identifying text content included in pictures or videos in the information, calculating similarity between the text content and information in a preset false information base, and filtering out the information if the similarity is greater than a preset threshold;

Matching the picture in the information or the image in the image frame of the video in the information with preset harmful image characteristics, and filtering out the information if the picture or the video is detected to contain the harmful image characteristics;

And acquiring a uniform resource locator (URI) of the picture or the video included in the information, and filtering the information if the uniform resource locator (URI) belongs to a website in a preset website blacklist.

With reference to the first aspect, in a second possible implementation manner of the first aspect, the step of acquiring scene data of the filtered information includes:

acquiring dialogue scene contents of a sender and a user of the filtered information;

If the dialogue scene content comprises voice, picture or video, converting the voice, picture or video into characters;

and generating a character sequence according to the time sequence according to the character content in the information and the converted characters.

With reference to the first aspect, in a third possible implementation manner of the first aspect, the step of substituting the scene data into a pre-trained neural network model, and outputting a label of the information includes:

removing stop words in text content in scene data by a text word segmentation method, and vectorizing text features;

And inputting the vectorized text features into a neural network model trained by positive and negative samples in advance, and outputting a label identification result of the information.

With reference to the first aspect, in a fourth possible implementation manner of the first aspect, before the step of obtaining scene data of the filtered information, substituting the scene data into a pre-trained neural network model, and outputting a label of the information, the method further includes:

whether the sensitive feature is triggered currently is judged, wherein the sensitive feature comprises a process for calling a financial application program and/or calling payment security.

With reference to the first aspect, in a fifth possible implementation manner of the first aspect, the step of generating the alert information according to the filtering result and/or the label of the information output by the neural network model includes:

when the information is filtered through preset filtering characteristics, outputting alarm information of a first level;

When the label of the false information is output through the neural network model, the alarm information of a second level is output, and the severity of the alarm information of the second level is higher than that of the alarm information of the first level.

With reference to the first aspect, in a sixth possible implementation manner of the first aspect, the step of generating the alert information according to the filtering result and/or the label of the information output by the neural network model includes: :

When the label of the false information is output through the neural network model, the current alarm information of the screen is locked, and the current scene data is sent to the bound child mobile terminal.

A second aspect of an embodiment of the present application provides an information identifying apparatus, including:

the filtering unit is used for filtering the received information through preset filtering characteristics;

The neural network model identification unit is used for acquiring scene data of the filtered information, substituting the scene data into a pre-trained neural network model and outputting a label of the information;

And the reminding unit is used for generating reminding information according to the filtering result and/or the label of the information output by the neural network model.

A third aspect of an embodiment of the present application provides an information identifying apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the information identifying method according to any one of the first aspects when the computer program is executed.

A fourth aspect of an embodiment of the present application provides a computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the information identification method according to any one of the first aspects.

Compared with the prior art, the embodiment of the application has the beneficial effects that: after the information is filtered through preset filtering characteristics, scene data of the filtered information are further obtained, the scene data are substituted into a pre-trained neural network model to be calculated, a label of an identification result of the information is output, and reminding information is generated according to the filtering and/or the label of the identification result, so that the fraud information can be globally and effectively identified, and the identification accuracy of the fraud information is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic implementation flow chart of an information identification method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an implementation of filtering received information according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a label for outputting information according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a reminder interface according to an embodiment of the present application;

Fig. 5 is a schematic diagram of an information identifying apparatus according to an embodiment of the present application;

Fig. 6 is a schematic diagram of an information identifying apparatus provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to illustrate the technical scheme of the application, the following description is made by specific examples.

Fig. 1 is a schematic implementation flow chart of an information identification method according to an embodiment of the present application, which is described in detail below:

In step S101, the received information is filtered by a preset filtering feature;

In particular, the filtering characteristics can include one or more types, and different types of filtering characteristics can be matched according to different information. If the source of the information is a mobile phone number, selecting a number blacklist as a filtering feature according to the source of the information; if the information comprises the website, the website can be used as a filtering characteristic according to a website database; if the information is text information, the information in the false information base can be selected as a filtering feature; if the information comprises voice, picture or video, extracting words in voice, picture or video, adopting information in a false information base as filtering characteristics, or directly comparing the information with images in picture or video through image characteristics; or if the picture or video corresponds to a uniform resource locator URL, a web address database may also be employed as a filtering feature.

Thus, the step of filtering the received information by the preset filtering characteristics may be as shown in fig. 2, and include one or several of the following filtering modes:

201, if the sending number of the detected information is a number in a preset number blacklist, filtering the information;

The number blacklist is used for recording numbers for fraud, and can be continuously perfected according to the reporting information of the user. The sending number of the information can be a mobile phone number for sending mobile phone short messages, an account number for sending instant messaging messages and the like. And comparing the sending number of the information with the numbers in the number blacklist by setting the number blacklist, and determining whether the current information is legal or not.

202, If the information is detected to comprise websites in a preset website blacklist, filtering the information;

The part of the information may include a web address, for example, after purchasing the commodity, service information of the commodity may be transmitted to the user, and the service information transmitted to the user may include the web address. And the phishing websites are also included in part of the fraud information. Therefore, in order to effectively identify the fraud information, a website blacklist can be preset, and websites included in the information are filtered through the website blacklist, so that harmful information is effectively screened out.

203, Calculating the similarity between the text content in the information and the information in a preset false information base, and filtering out the information if the similarity is larger than a preset threshold value;

For text information, the similarity between the text information and information in the false information base can be calculated, so that whether the information is false information or not can be determined.

While the rumors on the internet are very tedious, most are rooted and many have had several words (e.g., change time, place, etc.) changed. According to the characteristics of rumors, a TF-IDF (word frequency-inverse text frequency index) method for judging the similarity of texts can be adopted, the TF-IDF value of each term in the texts is calculated through the method, then a vector model is built for each text by utilizing the values, the cosine similarity among vectors is calculated to determine the similarity between the texts displayed by the mobile terminal and a false information base, such as a Chinese rumor base (a rumor database collected and arranged by natural language processing of Qinghua university and a social human computing laboratory, and the aim is to collect rumor cases widely spread in a Chinese social media platform) at a server side. Once the text content is determined to contain rumor information, the reminder help module may be awakened.

204, Identifying text content included in pictures or videos in the information, calculating the similarity between the text content and information in a preset false information base, and filtering out the information if the similarity is greater than a preset threshold;

If the information includes multimedia information, such as a picture and a video, text content in the picture or the video can be extracted by means of OCR (optical character recognition), and when the extracted text content is detected, whether the content in the picture or the video includes rumors can be determined based on the text content detection method 203.

205, Matching the picture in the information or the image in the image frame of the video in the information with preset harmful image characteristics, and filtering out the information if the picture or the video is detected to contain the harmful image characteristics;

Or when the picture or the video in the information does not contain text content or the text content contained in the picture or the video is not rumor or fraud content, the image feature detection can be further carried out on the picture or the video to determine whether the picture or the video contains harmful image features or not.

Wherein, the harmful image characteristics can be preset, and the harmful image characteristics can be learned and perfected according to the determined fraud information. The images in the video can be acquired by randomly intercepting the video pictures.

And 206, acquiring a uniform resource locator (URI) of the picture or the video included in the information, and filtering the information if the uniform resource locator (URI) belongs to a website in a preset website blacklist.

When the information includes the uniform resource locator URL of the multimedia resource, the URL of the multimedia resource in the information can be detected according to a preset website blacklist to determine whether the source of the multimedia in the information is legal.

In step S102, acquiring scene data of the filtered information, substituting the scene data into a pre-trained neural network model, and outputting a label of the information;

After the information is intercepted or filtered preliminarily in step S101, the information may include obvious fraud or false information filtering, but if the fraudster is faked as a acquaintance, the information needs to be further identified in step S102.

In a preferred embodiment of the present application, before step S102, it may further be detected whether the triggering condition of step S102 is currently satisfied, where the triggering condition may be a call of an application program of a payment class, such as a call of an information transceiver application to a financial class application (bank APP) or a payment class application (WeChat payment, payment device payment), and may further include a key process called by payment, such as a process of inputting a password by a fingerprint sensor or a secret keyboard, and when a key process calling a preset payment call is detected, step S102 is triggered to further detect information.

In this step, the scene data may be obtained according to the filtered information, and the scene data may be substituted into the trained neural network model, and the label of the scene data where the information is located may be output, where the label may be a positive label or a negative label, or a label represented as normal information label or abnormal information label. As shown in fig. 3, the method specifically includes:

in step S301, acquiring dialogue scene contents of the sender and the user of the filtered information;

The information received by the user or the information received and transmitted by the user can be recorded in real time. When the information received by the user is filtered or further accords with the triggering condition, the scene data corresponding to the filtered information can be extracted.

The scene data comprises communication content of the user and the information sender, and the receiving and transmitting time of the information. In general, information transmission and reception contents of a user and one information transmitter can be used as scene data. When the information content includes the name and number of the third party, the communication content of the third party can be combined into one scene data.

In step S302, if the dialogue scene content includes voice, picture or video, converting the voice, picture or video into text;

After the scene data is acquired, the pictures, the voices and the videos in the scene data can be converted, the voices can be converted into characters through ASR (speech recognition technology), and the pictures or the videos can be converted into the characters through OCR (optical character recognition).

In step S303, a text sequence is generated from the text content and the converted text in the information according to the time sequence.

After converting the multimedia in the scene data into text, the original text and the converted text in the information can be generated into a text sequence according to time sequence, or when the scene data does not comprise the multimedia information, the text sequence can also be directly generated in the scene data according to time sequence, so that the extraction and preprocessing operation of the scene data are completed.

In step S304, removing stop words in text content in scene data by a text word segmentation method, and vectorizing text features;

after the scene data is extracted, the characters of the scene data can be further subjected to feature extraction, for example, a text word segmentation technology can be used for generating a word song, words are deactivated for the text, and the extracted text is subjected to vectorization processing.

In step S305, the vectorized text feature is input into a neural network model trained by positive and negative samples in advance, and a label recognition result of the information is output.

For example, the structure of the cyclic neural network model may adopt a bidirectional LSTM long-term memory network, two layers may be set, the unit size of each layer may be set to 128, and the model splices the output of the previous time and the hidden layer vector of the previous time, as the input of the next time. The model can construct Embedding layers to perform text preprocessing, a dictionary is generated by using a text word segmentation technology, words are deactivated on the text, a Embedding matrix is obtained by a text feature extraction technology, and input data text is vectorized according to the Embedding matrix and the dictionary; then the model makes an Attention focusing mechanism for the extracted hidden layer vector at each moment, wherein the Attention focusing mechanism is a similarity measure, so that the useful detail information is ensured to be reserved; the final model can be classified using softmax, outputting positive and negative samples of 1 or 0.

In step S103, according to the filtering result and/or the label of the information output by the neural network model, the reminding information is generated.

In the application, the alarm information can be generated only according to the input information of the neural network model, or can be generated according to the neural network model and the filtering result, for example:

As shown in fig. 4, when the alarm information of the first level is adopted, the alarm information can be a popup prompt in the left graph, the popup only informs the user that the browsed information has rumor content, the popup sets two buttons of 'i know' and 'one-key help seeking', and the user clicks 'i know' to close the popup and continue to normally use the mobile terminal; for the alarm information of the second level, a striking popup warning can be adopted, the striking popup can occupy the whole screen, the popup is provided with two buttons of "calling" and "one-key help" and locks the content displayed on the screen, the user cannot close, and the mobile terminal only keeps the function of dialing the telephone (comprising the address book and the nearest contact).

Under the second-level alarm information, the user can click a 'one-key help' button to package and send the content (chat dialogue and the like) related to the alarm event to a preset child mobile terminal, so that the child can check the event process and persuade the parent, and can also unlock the parent remotely. Therefore, the fraud information can be timely and effectively found, and fraud is prevented.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Fig. 5 is a schematic structural diagram of an information identifying apparatus according to an embodiment of the present application, where the information identifying apparatus includes:

a filtering unit 501, configured to filter the received information according to a preset filtering characteristic;

The neural network model recognition unit 502 is configured to obtain scene data of the filtered information, substitute the scene data into a neural network model trained in advance, and output a label of the information;

The reminding unit 503 is configured to generate reminding information according to the filtering result and/or the label of the information output by the neural network model.

The information identifying apparatus corresponds to the information identifying method shown in fig. 1.

Fig. 6 is a schematic diagram of an information identifying apparatus according to an embodiment of the present application. As shown in fig. 6, the information identifying apparatus 6 of this embodiment includes: a processor 60, a memory 61 and a computer program 62, such as an information recognition program, stored in the memory 61 and executable on the processor 60. The processor 60, when executing the computer program 62, implements the steps of the various information identification method embodiments described above, such as steps 101 through 103 shown in fig. 1. Or the processor 60, when executing the computer program 62, performs the functions of the modules/units of the apparatus embodiments described above, such as the functions of the modules 501-503 shown in fig. 5.

Illustratively, the computer program 62 may be partitioned into one or more modules/units that are stored in the memory 61 and executed by the processor 60 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 62 in the information recognition device 6. For example, the computer program 62 may be partitioned into a filtering unit, a neural network model recognition unit, and a reminder unit, each of which functions specifically as follows:

The information identifying apparatus 6 may be a computing apparatus such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The information identifying apparatus may include, but is not limited to, a processor 60, a memory 61. It will be appreciated by those skilled in the art that fig. 6 is merely an example of the information identifying apparatus 6 and is not meant to be limiting as to the information identifying apparatus 6, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the information identifying apparatus may further include an input-output apparatus, a network access apparatus, a bus, etc.

The Processor 60 may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 61 may be an internal storage unit of the information identifying apparatus 6, such as a hard disk or a memory of the information identifying apparatus 6. The memory 61 may also be an external storage device of the information identifying apparatus 6, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided on the information identifying apparatus 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the information identifying apparatus 6. The memory 61 is used for storing the computer program and other programs and data required for the information identifying apparatus. The memory 61 may also be used for temporarily storing data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. . Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium may include content that is subject to appropriate increases and decreases as required by jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is not included as electrical carrier signals and telecommunication signals.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. An information identification method, characterized in that the information identification method comprises:

Filtering the received information through preset filtering characteristics;

Generating reminding information according to the filtering result and/or the label of the information output by the neural network model;

The step of substituting the scene data into a pre-trained neural network model and outputting the label of the information comprises the following steps:

Inputting the vectorized text features into a neural network model trained by positive and negative samples in advance, and outputting a label identification result of the information;

The structure of the neural network model adopts a bidirectional LSTM long-term memory network, and the neural network model takes the output of the previous moment and the hidden layer vector of the previous moment as the splice and takes the splice as the input of the next moment; the neural network model builds Embedding layers to perform text preprocessing, a dictionary is generated through a text word segmentation method, stop words of text content in scene data are removed, a Embedding matrix is obtained through a text feature extraction technology, and text features are vectorized according to the Embedding matrix and the dictionary; then the neural network model performs Attention focusing mechanism on the extracted hidden layer vector at each moment; finally the neural network model is classified using softmax.

2. The information recognition method according to claim 1, wherein the step of filtering the received information by a preset filtering feature includes one or more of:

3. The information recognition method according to claim 1, wherein the step of acquiring scene data of the filtered information includes:

4. The information recognition method according to claim 1, wherein, before the step of acquiring scene data of the filtered information, substituting the scene data into a neural network model trained in advance, and outputting a label of the information, the method further comprises:

5. The information recognition method according to claim 1, wherein the step of generating the reminder information according to the filtering result and/or the label of the information output by the neural network model comprises:

6. The information recognition method according to claim 5, wherein the step of generating the reminder information according to the filtering result and/or the label of the information output by the neural network model comprises:

7. An information identifying apparatus, characterized in that the information identifying apparatus includes:

the reminding unit is used for generating reminding information according to the filtering result and/or the label of the information output by the neural network model;

8. An information recognition device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the information recognition method according to any one of claims 1 to 6 when the computer program is executed by the processor.

9. A computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the information identification method according to any one of claims 1 to 6.