WO2016082695A1 - 一种文件识别方法及装置 - Google Patents

一种文件识别方法及装置 Download PDF

Info

Publication number
WO2016082695A1
WO2016082695A1 PCT/CN2015/094792 CN2015094792W WO2016082695A1 WO 2016082695 A1 WO2016082695 A1 WO 2016082695A1 CN 2015094792 W CN2015094792 W CN 2015094792W WO 2016082695 A1 WO2016082695 A1 WO 2016082695A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
link
dimensional code
information
code information
Prior art date
Application number
PCT/CN2015/094792
Other languages
English (en)
French (fr)
Inventor
吴志勇
Original Assignee
阿里巴巴集团控股有限公司
吴志勇
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 吴志勇 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2016082695A1 publication Critical patent/WO2016082695A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Definitions

  • the present invention relates to the field of data transmission technologies, and in particular, to a file identification method and apparatus.
  • the linkA is generated to generate a two-dimensional code picture, and user A uses the IM software to make the second
  • the dimension code image file is sent to the user B, and the user B is induced to scan the two-dimensional code picture by using the mobile phone, and input the user name and password of the user B; and the user A can obtain the user B in the online banking bankA at the back end of the linkA website.
  • Username and password After that, User A logs in using User B's online banking username and password and transfers User B's balance to his or her account.
  • the inventors of the present invention have found that the prior art cannot recognize whether or not a malicious link is included in the two-dimensional code picture, thereby causing a problem that the security of the IM software user data information is lowered.
  • a file identification method and device are provided in the embodiment of the present invention to solve the problem in the prior art. It is not possible to recognize whether the QR code image contains a malicious link, which causes a technical problem that the security of the user data information is lowered.
  • the first aspect provides a file identification method, including:
  • the file is a picture file, acquiring the two-dimensional code information in the picture file;
  • the acquiring the two-dimensional code information in the image file includes:
  • the determining whether the two-dimensional code information includes a malicious link includes:
  • link If the link is included, continue to determine if the link is a malicious link.
  • the determining whether the two-dimensional code information includes a link includes: determining the text Whether the uniform resource locator URL is included in the information to determine whether the link information is included in the file information;
  • Determining whether the link is a malicious link includes: determining whether the URL in the text information is in a black and white list of the URL to determine whether the link is a malicious link; or determining, in the text information, according to a URL scoring mechanism Whether the URL is a malicious link.
  • it also includes:
  • the file When the file is received, it is determined whether the file is a picture file, and if it is a picture file, the step of acquiring the two-dimensional code information in the picture file is performed.
  • the second aspect provides a file identification device, including:
  • a receiving unit configured to receive a file
  • An acquiring unit configured to acquire two-dimensional code information in the image file when the file received by the receiving unit is a picture file
  • a first determining unit configured to determine whether the malicious link is included in the two-dimensional code information
  • the prompting unit is configured to prompt the user to delete the file when the first determining unit determines that the two-dimensional code information includes a malicious link.
  • the obtaining unit includes:
  • a binarization processing unit configured to perform binarization processing on the image file to obtain a black and white two-color image file
  • a second determining unit configured to determine whether the black and white two-color picture file includes a two-dimensional code letter interest
  • an extracting unit configured to: when the second unit determines that the two-dimensional code information is included in the picture file of the black and white color, extract the data information in the two-dimensional code information;
  • a conversion unit configured to convert the data information into text information.
  • the first determining unit includes:
  • a link determining unit configured to determine whether the text information included in the conversion unit conversion includes a link
  • the malicious link determining unit is configured to continue to determine whether the link is a malicious link when the link determining unit determines that the text information includes a link.
  • the link determining unit is specifically configured to determine whether the uniform resource locator URL is included in the text information to determine whether the link information is included in the file information.
  • the malicious link determining unit is specifically configured to determine, according to the URL black and white list or the URL scoring mechanism, whether the URL in the text information is a malicious link.
  • it also includes:
  • a third determining unit configured to determine, when the receiving unit receives the file, whether the file is a picture file
  • the obtaining unit is further configured to: when the third determining unit determines that the file is a picture file, acquire two-dimensional code information in the picture file.
  • the embodiment of the present invention by performing the received image file Security check, scan the QR code in the image file, and intercept the malicious link in the QR code information; prevent the receiving user from seeing or opening the file, reducing the leakage of user data or property loss, thereby improving the user data. Security.
  • FIG. 1 is a flowchart of a file identification method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of feature information included in a black and white two-color picture file according to an embodiment of the present invention
  • FIG. 3 is another flowchart of a file identification method according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a file identification apparatus according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of another structure of a file identification apparatus according to an embodiment of the present disclosure.
  • FIG. 6 is another schematic structural diagram of a file identification apparatus according to an embodiment of the present invention.
  • FIG. 7 is another schematic structural diagram of a file identification apparatus according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • first, second, third, etc. may be used to describe various information in the embodiments of the present invention, such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information without departing from the scope of the embodiments of the invention, and does not necessarily require or imply any such actual relationship or order.
  • second information may also be referred to as first information.
  • the word “if” as used herein may be interpreted as “when” or “when” or “in response to a determination.”
  • the term “comprises” or “comprises” or “comprises” or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device.
  • FIG. 1 is a flowchart of a file identification method according to an embodiment of the present invention
  • Step 101 Receive a file
  • the file received by the user terminal may be a picture file, or may be a text file or the like.
  • the user terminal in this embodiment may also be a user terminal integrated with the IM software client software, or may be another terminal, which is not limited in this embodiment.
  • Step 102 If the file is a picture file, obtain the two-dimensional code information in the picture file;
  • the process of acquiring the two-dimensional code information in the image file includes:
  • the user terminal performs binarization processing on the picture file to obtain a picture file of black and white color
  • QR QR code as an example to illustrate:
  • the user terminal will perform binarization processing on the image file, that is, convert the original image file into a picture file containing only black and white colors;
  • the black and white two-color picture file includes: a position detection pattern, a position detection pattern separator, a positioning pattern, a correction pattern, a format information, a version information, a number and an error correction code word, and a feature information such as a blank area.
  • FIG. 2 is a schematic diagram of feature information included in a black and white two-color picture file according to an embodiment of the present invention.
  • the process of acquiring the data information in the two-dimensional code information includes:
  • the data information in the two-dimensional code is obtained by scanning the data segment in the two-dimensional code information, and then the data information is converted into text information, and the specific conversion process is well known to those skilled in the art. , will not repeat them here.
  • Step 103 Determine whether the malicious link is included in the two-dimensional code information.
  • the specific process includes:
  • determining whether the link information is included in the text information in the two-dimensional code information specifically: determining whether the file information includes a link according to whether the uniform resource locator URL is included in the text information; specifically:
  • One method is: determining whether there is text in the text information beginning with http, https, ftp, and if so, determining that the text information includes a URL; otherwise, determining that the text information does not include a URL.
  • Another method is: determining whether there is text in the text information that matches the beginning of the definition of the URL standard, for example, the text information includes text in the form of xxx.xx, and if so, determining that the text information includes the URL; otherwise, determining the The URL is not included in the text message.
  • determining whether the link is a malicious link is specifically:
  • One way is: determining whether the URL in the text information is in a black and white list of URLs to determine whether the link is a malicious link;
  • the URL is extracted from the text information, it is determined whether the extracted URL is In the URL black and white list, if it is in the URL white list, it is determined whether the URL is a secure (ie, normal) link, and if it is in the URL blacklist, it is determined whether the URL is a malicious link.
  • the link is a secure link.
  • Another way is to determine whether the URL in the text information is a malicious link according to a URL scoring mechanism.
  • the URL scoring mechanism is: a character scoring mechanism is adopted for the URL of the non-URL black and white list, and the black and white gray classification is performed when the URL reaches a certain score.
  • a URL is:
  • Http://actaobao-ina.com which is determined by the scoring system to have a similarity with http://taobao.com, will list this URL as a suspicious URL, thus determining the link including the URL as a malicious link.
  • Step 104 If a malicious link is included, the user is prompted to delete the file.
  • the user who receives the file is reminded that opening the file is risky, and it is recommended to delete the file; of course, if it is determined that the URL included in the file is a secure link, Prompt for any action by the user who received the file. That is, determining whether the URL in the file is a malicious link is not perceptible to the user.
  • the security check of the received picture file is performed, the two-dimensional code in the picture file is scanned, and the malicious link in the two-dimensional code information is intercepted; the received user is prevented from seeing or opening the file, thereby reducing The leakage of user data or the loss of property, thereby improving the security of user data.
  • FIG. 3 is another flowchart of a file identification method according to an embodiment of the present invention.
  • the difference between the embodiment and the foregoing embodiment is: when the file is received, it is determined whether the file is For the picture file, if it is a picture file, performing the step of acquiring the two-dimensional code information in the picture file includes:
  • Step 301 Receive a file.
  • Step 302 Determine whether the file is a picture file, if yes, go to step 303; otherwise, go to step 307;
  • the specific judgment process of the file is determined by the extension of the file or the special format of the image file header.
  • the specific judgment process is well known to those skilled in the art, and details are not described herein.
  • Step 303 Acquire two-dimensional code information in the picture file.
  • the obtaining process includes: performing binarization processing on the image file to obtain a black and white two-color image file; determining whether the black and white two-color image file includes two-dimensional code information; if yes, acquiring Data information in the two-dimensional code information; converting the data information into text information.
  • the specific acquisition process is described in detail in the foregoing embodiment, and details are not described herein again.
  • Step 304 Determine whether the link is included in the two-dimensional code information; if yes, go to step 305; otherwise, go to step 307;
  • this step scanning the two-dimensional code information, obtaining the data information in the two-dimensional code information, extracting the data information, and converting the data information into text information, and then determining whether the text information includes the uniform resource locator URL, if Including determining whether to determine whether the QR code includes a chain Otherwise; otherwise, it is determined that there is no link in the QR code information; if the link is not included, the process ends.
  • Step 305 Determine whether the link is a malicious link, and if yes, go to step 306; otherwise, go to step 307;
  • a judgment manner is: determining whether the URL in the text information is in a URL black and white list, if in the white list, determining that the link is a secure link; if in the blacklist, determining The link is a malicious link;
  • Another way of determining is to determine whether the URL in the text information is a malicious link according to a URL scoring mechanism.
  • Step 306 Prompt the user to delete the file.
  • the user may be prompted to delete the file and intercept the malicious link, thereby preventing the user from being deceived.
  • Step 307 End this process.
  • the link in the image file is not a malicious link
  • the image file is a normal image file, and the user can see or open the file.
  • the security check of the received picture file is performed, the two-dimensional code in the picture file is scanned, and the malicious link in the two-dimensional code information is scanned and intercepted, thereby preventing the illegal elements from being transmitted through the IM software. Maliciously link the image file of the QR code to defraud the victim to click to achieve the user's sensitive information and money. Thereby improving the security of user data.
  • the embodiment of the present invention further provides a file identification device, which is based on the implementation process of the foregoing method.
  • the structure is as shown in FIG. 4, and the device includes: a receiving unit 41, an obtaining unit 42, a first determining unit 43, and a prompting unit 44, wherein
  • the receiving unit 41 is configured to receive a file
  • the obtaining unit 42 is configured to acquire the two-dimensional code information in the picture file when the file received by the receiving unit 41 is a picture file;
  • the first determining unit 43 is configured to determine whether the malicious link is included in the two-dimensional code information acquired by the acquiring unit 42.
  • the prompting unit 44 is configured to prompt the user to delete the file when the first determining unit 43 determines that the two-dimensional code information includes a malicious link.
  • the embodiment is based on the foregoing embodiment, the acquiring unit 42 includes: a binarization processing unit 51, a second determining unit 52, an extracting unit 53 and a converting unit 54,
  • the structure diagram is shown in Figure 5, where
  • the binarization processing unit 51 is configured to perform binarization processing on the picture file to obtain a picture file of black and white color;
  • the second determining unit 52 is configured to determine whether the two-dimensional code information is included in the picture file of the black and white color;
  • the extracting unit 53 is configured to: when the second unit determines that the two-dimensional code information is included in the picture file of the black and white color, extract the data information in the two-dimensional code information;
  • the converting unit 54 is configured to convert the data information into text information.
  • the embodiment is based on the foregoing embodiment, the first determining unit 43 includes: a link determining unit 61 and a malicious link determining unit 62, and a schematic structural diagram thereof is shown in FIG. 6. ,among them,
  • the link determining unit 61 is configured to determine whether the link information obtained by the conversion unit includes a link.
  • the malicious link determining unit 62 is configured to continue to determine whether the link is a malicious link when the link determining unit 61 determines that the text information includes a link.
  • the link determining unit 61 is specifically configured to determine whether the link information is included in the file information according to whether the uniform resource locator URL is included in the text information.
  • the malicious link determining unit 62 is specifically configured to determine whether the URL in the text information is a malicious link according to a URL black and white list or a URL scoring mechanism.
  • the embodiment is based on the foregoing embodiment, the device further includes: a third determining unit 71, and a schematic structural diagram thereof is shown in FIG.
  • the third determining unit 71 is configured to determine, when the receiving unit 41 receives the file, whether the file is a picture file;
  • the obtaining unit 42 is further configured to acquire the two-dimensional code information in the picture file when the third determining unit 71 determines that the file is a picture file.
  • the binarization processing unit 51 in the obtaining unit 42 may acquire the two-dimensional code information in the picture file when the third determining unit 71 determines that the file is a picture file.
  • the device can be integrated in the terminal or integrated in the end of the installation of the IM client software.
  • the terminal can also be deployed independently, which is not limited in this embodiment.
  • FIG. 8 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • the terminal 800 includes: a processor 810, a memory 820, a transceiver 830, and a bus 840.
  • the processor 810, the memory 820, and the transceiver 830 are connected to each other through a bus 840; the bus 840 may be an ISA bus, a PCI bus, or an EISA bus.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 8, but it does not mean that there is only one bus or one type of bus.
  • the memory 820 is configured to store a program.
  • the program can include program code, the program code including computer operating instructions.
  • the memory 820 may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory.
  • the transceiver 830 is used to connect other devices and communicate with other devices. Specifically, the transceiver 830 can be configured to: receive a file;
  • the processor 810 is configured to execute the program code stored in the memory 820, and obtain the two-dimensional code information in the picture file when the file is a picture file; and determine whether the two-dimensional code information is included in the Malicious link
  • the transceiver 830 is further configured to prompt the user to delete the file when the processor determines that the two-dimensional code information includes a malicious link.
  • the processor 810 is configured to obtain the two-dimensional code information in the image file, including: Performing binarization processing on the picture file to obtain a black and white two-color picture file; determining whether the black and white two-color picture file includes two-dimensional code information; if yes, extracting data in the two-dimensional code information Information; converting the data information into text information.
  • the processor 810 is configured to determine whether the malicious link is included in the two-dimensional code information, including: determining whether the text information includes a link; and if the link is included, continuing to determine whether the link is a malicious link.
  • the determining, by the processor 810, whether the link is included in the two-dimensional code information includes: determining whether the uniform resource locator URL is included in the text information to determine whether the link information is included in the file information;
  • the determining, by the processor 810, whether the link is a malicious link includes: determining whether the URL in the text information is in a black and white list of the URL to determine whether the link is a malicious link; or determining, according to a URL scoring mechanism Whether the URL in the text information is a malicious link.
  • the processor 810 is configured to determine, when the receiver receives the file, whether the file is a picture file, and when determining that the file is a picture file, acquire two-dimensional in the picture file.
  • the code information is used to determine whether the malicious link is included in the two-dimensional code information; if the malicious link is included, the user is prompted to delete the file.
  • the processor performs security check on the image file received by the transceiver, scans the two-dimensional code in the image file, and scans and intercepts the malicious link in the two-dimensional code information to prevent the illegal elements from passing through the IM.
  • the software spreads the image file with the malicious link QR code to defraud the victim to click to obtain the user's sensitive information and money; thus improving the security of the user data.
  • the techniques in the embodiments of the present invention can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM. , a disk, an optical disk, etc., including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention or portions of the embodiments.
  • a computer device which may be a personal computer, server, or network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本发明实施例公开了一种文件识别方法及装置,所述方法包括:接收文件;如果所述文件为图片文件,则获取所述图片文件中的二维码信息;判断所述二维码信息中是否包括恶意链接;如果包括恶意链接,则提示用户删除所述文件。本发明实施例中,通过对接收到的图片文件进行安全检查,扫描图片文件中二维码,并且对二维码信息中的恶意的链接进行拦截;防止接收的用户看到或打开文件,降低了用户数据的泄露或财物损失,从而提高了用户数据的安全性。

Description

一种文件识别方法及装置 技术领域
本发明涉及数据传输技术领域,特别涉及一种文件识别方法及装置。
背景技术
目前,即时通信(IM,Instance Messaging)客户端软件都具备有文件传输功能。随着网络的快速普及,不法分子利用文件传输的功能进行远程攻击,其过程为:
假如用户A有一个骗取用户密码的链接linkA(例如,该链接可能与某网银bankA的网站相似,称为恶意链接),把该链接linkA生成一个二维码图片,用户A通过IM软件将该二维码图片文件发给用户B,诱导用户B使用手机扫描这个二维码图片,并输入用户B的用户名和密码;而用户A在自己linkA网站的后端就可以获取到用户B在网银bankA的用户名和密码,之后,用户A使用用户B的网银用户名和密码登陆并将用户B的余额转入自己的账号里。
由此可知,现有的IM客户端软件对于文件传输完全透明,没有针对传输的图片文件进行额外的处理,因此,任何包含恶意链接的二维码图片都能到达接收方,导致接收到这类二维码图片的用户都存在被攻击的风险,比如用户信息被盗取或财务损失等。
因此,本发明的发明人发现,由于现有技术中不能识别二维码图片中是否含有恶意链接,从而导致IM软件用户数据信息安全性降低的问题。
发明内容
本发明实施例中提供了一种文件识别方法及装置,以解决现有技术中由于 不能识别二维码图片中是否含有恶意链接,而导致用户数据信息安全性降低的技术问题。
为了解决上述技术问题,本发明实施例公开了如下技术方案:
第一方面提供了一种文件识别方法,包括:
接收文件;
如果所述文件为图片文件,则获取所述图片文件中的二维码信息;
判断所述二维码信息中是否包括恶意链接;
如果包括恶意链接,则提示用户删除所述文件。
可选的,所述获取所述图片文件中的二维码信息,包括:
对所述图片文件进行二值化处理,得到黑白二色的图片文件;
判断所述黑白二色的图片文件中是否包括二维码信息;
如果包括二维码信息,则提取所述二维码信息中的数据信息;
将所述数据信息转换为文本信息。
可选的,所述判断所述二维码信息中是否包括恶意链接,包括:
判断所述文本信息中是否包括链接;
如果包括链接,继续判断所述链接是否为恶意链接。
可选的,所述判断所述二维码信息中是否包括链接,包括:判断所述文 本信息中是否包括统一资源定位符URL来确定所述文件信息中是否包括链接;
所述判断所述链接是否为恶意链接,包括:判断所述文本信息中的所述URL是否在URL黑白名单来判断所述链接是否为恶意链接;或者按照URL评分机制判断所述文本信息中的所述URL是否为恶意链接。
可选的,还包括:
在接收文件时,判断所述文件是否为图片文件,如果为图片文件,执行所述获取所述图片文件中的二维码信息的步骤。
第二方面提供了一种文件识别装置,包括:
接收单元,用于接收文件;
获取单元,用于在所述接收单元接收到的所述文件为图片文件时,获取所述图片文件中的二维码信息;
第一判断单元,用于判断所述二维码信息中是否包括恶意链接;
提示单元,用于在所述第一判断单元判断所述二维码信息中包括恶意链接时,提示用户删除所述文件。
可选的,所述获取单元包括:
二值化处理单元,用于对所述图片文件进行二值化处理,得到黑白二色的图片文件;
第二判断单元,用于判断所述黑白二色的图片文件中是否包括二维码信 息;
提取单元,用于在所述第二单元判断所述黑白二色的图片文件中包括二维码信息时,提取所述二维码信息中的数据信息;
转换单元,用于将所述数据信息转换为文本信息。
可选的,所述第一判断单元包括:
链接判断单元,用于判断所述转换单元转换得到的所述文本信息中是否包括链接;
恶意链接判断单元,用于在所述链接判断单元判断所述文本信息中包括链接时,继续判断所述链接是否为恶意链接。
可选的,所述链接判断单元,具体用于判断所述文本信息中是否包括统一资源定位符URL来确定所述文件信息中是否包括链接;
所述恶意链接判断单元,具体用于按照URL黑白名单或URL评分机制判断所述文本信息中的所述URL是否为恶意链接。
可选的,还包括:
第三判断单元,用于在所述接收单元接收到文件时,判断所述文件是否为图片文件;
所述获取单元,还用于在所述第三判断单元判断所述文件为图片文件时,获取所述图片文件中的二维码信息。
由上述技术方案可知,本发明实施例中,通过对接收到的图片文件进行 安全检查,扫描图片文件中二维码,并且对二维码信息中的恶意的链接进行拦截;防止接收的用户看到或打开文件,降低了用户数据的泄露或财物损失,从而提高了用户数据的安全性。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种文件识别方法的流程图;
图2为本发明实施例提供的一种黑白二色的图片文件中包括的特征信息示意图;
图3为本发明实施例提供的一种文件识别方法的另一流程图;
图4为本发明实施例提供的一种文件识别装置的结构示意图;
图5为本发明实施例提供的一种文件识别装置的另一结构示意图;
图6为本发明实施例提供的一种文件识别装置的另一结构示意图;
图7为本发明实施例提供的一种文件识别装置的另一结构示意图;
图8为本发明实施例提供的一种终端的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、完整的描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
在本发明实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本发明。在本发明实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本发明实施例中可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本发明实施例范围的情况下,第一信息也可以被称为第二信息,不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。
请参阅图1,图1为本发明实施例提供的一种文件识别方法的流程图;所述方法包括:
步骤101:接收文件;
该步骤中,用户终端接收到的文件可能是图片文件,也可能是文本文件等。其中,本实施例中的用户终端,也可以是集成有IM软件客户端软件的用户终端,也可以是其他终端,本实施例不作限制。
步骤102:如果所述文件为图片文件,则获取所述图片文件中的二维码信息;
其中,获取所述图片文件中的二维码信息的过程,包括:
用户终端对所述图片文件进行二值化处理,得到黑白二色的图片文件;
以QR二维码为例来说明:
1)用户终端会将图片文件进行二值化处理,即将原始图片文件转换成只包含黑白二色的图片文件;其中,
其中,黑白二色的图片文件中包括:位置探测图形、位置探测图形分隔符、定位图形、校正图形、格式信息、版本信息,数字和纠错码字,以及空白区等特征信息。这些特征具体如图2所示,图2为本发明实施例提供的一种黑白二色的图片文件中包括的特征信息示意图。
2)判断所述黑白二色的图片文件中是否包括二维码信息;
该步骤中,可以根据黑白二色的图片文件中的位置探测图形、位置探测图形分隔符和定位图形中来判断是否存在二维码,具体的判断过程对于本领域技术人员来说,已是熟知技术,在此不再赘述。
3)如果该图片文件中包括二维码信息,则获取所述二维码信息中的数据信息;
其中,获取所述二维码信息中的数据信息的过程包括:
通过扫描该二维码信息中的数据段,获得该二维码中的数据信息,然后,将该数据信息转换为文本信息,其具体的转换过程对于本领域技术人员来说,已是熟知技术,在此不再赘述。
步骤103:判断所述二维码信息中是否包括恶意链接;
该步骤中,先判断所述二维码信息中的所述文本信息中是否包括链接;如果包括链接,再继续判断所述链接是否为恶意链接,其具体过程包括:
其中,判断所述二维码信息中的所述文本信息中是否包括链接,具体为:按照所述文本信息中是否包括统一资源定位符URL来判断所述文件信息中是否包括链接;具体包括:
一种方式为:判断文本信息中是否有以http,https,ftp开头的文本,如果有,则确定该文本信息中包括URL;否则,确定该文本信息中没有包括URL。
另一方式为:判断文本信息中是否有符合URL标准定义开头的文本,比如,文本信息中包含xxx.xx的形式的文本等,如果有,则确定该文本信息中包括URL;否则,确定该文本信息中没有包括URL。
其中,判断所述链接是否为恶意链接,具体为:
一种方式为:判断所述文本信息中的所述URL是否在URL黑白名单来判断所述链接是否为恶意链接;
这种方式中,如果从文本信息中提取到URL,则判断提取的URL是否在 URL黑白名单中,如果在URL白报名单中,则确定所述URL是否为安全(即正常)链接,如果在URL黑名单中,则确定所述URL是否为恶意链接。
如果从文本信息中没有提取到URL,则确定所述链接为安全链接。
另一方式为:按照URL评分机制判断所述文本信息中的所述URL是否为恶意链接。
其中,URL评分机制为:对于非URL黑白名单的URL采用字符评分机制,当达到一定分值的URL则进行黑白灰归类。以仿冒淘宝网为例,例如某个URL为:
http://actaobao-ina.com,通过评分系统认定为跟http://taobao.com有一定相似度,会将此URL列为可疑URL,从而将包括URL的链接确定为恶意链接。
步骤104:如果包括恶意链接,则提示用户删除所述文件。
在上述步骤确定出该文件包括的URL为恶意链接时,向接收到该文件的用户提醒打开该文件存在风险,建议删除该文件;当然,如果确定该文件中包括的URL为安全链接,就不用提示接收到该文件的用户任何操作。也就是说,确定文件中的URL是否为恶意链接对用户来说是感知不到的。
本发明实施例中,通过对接收到的图片文件进行安全检查,扫描图片文件中二维码,并且对二维码信息中的恶意的链接进行拦截;防止接收的用户看到或打开文件,降低了用户数据的泄露或财物损失,从而提高了用户数据的安全性。
还请参阅图3,图3为本发明实施例提供的一种文件识别方法的另一流程图,该实施例与上述实施例的不同之处在于,在接收到文件时,判断所述文件是否为图片文件,如果为图片文件,执行所述获取所述图片文件中的二维码信息的步骤,具体包括:
步骤301:接收文件;
步骤302:判断所述文件是否为图片文件,如果是,执行步骤303;否则执行步骤307;
其中,可以通过文件的扩展名,或者图片文件头的特殊格式进行判断,其具体的判断过程,对于本领域技术人员来说,已是熟知技术,在此不再赘述。
步骤303:获取所述图片文件中的二维码信息;
该步骤中,获取的过程包括:对所述图片文件进行二值化处理,得到黑白二色的图片文件;判断所述黑白二色的图片文件中是否包括二维码信息;如果是,则获取所述二维码信息中的数据信息;将所述数据信息转换为文本信息。其具体的获取过程详见上述实施例中的描述,在此不再赘述。
步骤304:判断所述二维码信息中是否包括链接;如果是,执行步骤305;否则执行步骤307;
该步骤中,扫描二维码信息,得到二维码信息中的数据信息,提取该数据信息,并将该数据信息转换成文本信息,然后判断该文本信息中是否包括统一资源定位符URL,如果包括,确定该来确定该二维码信息中是否包括链 接;否则,确定该二维码信息中没有链接;如果没有包括链接,则结束本次流程。
步骤305:判断所述链接是否为恶意链接,如果是,执行步骤306;否则,执行步骤307;
该步骤中,一种判断方式为:判断所述文本信息中的所述URL是否在URL黑白名单中,如果在白名单中,则确定所述链接为安全链接;如果在黑名单中,则确定所述链接为恶意链接;
另一判断方式为按照URL评分机制判断所述文本信息中的所述URL是否为恶意链接。其具体的判断过程详见上述实施例中的描述,在此不再赘述。
步骤306:提示用户删除所述文件。
在该步骤中,在确定该链接为恶意链接时,可以提示用户删除该文件,并对该恶意链接进行拦截,从而防止用户被骗。
步骤307:结束本次流程。
也就是说,如果图片文件中的链接不是恶意链接,说明该图片文件为正常的图片文件,用户可以看到或打开文件。
本发明实施例中,通过对接收到的图片文件进行安全检查,扫描图片文件中二维码,并且对二维码信息中的恶意的链接进行扫描和拦截,防止不法分子通过IM软件传播带有恶意链接二维码的图片文件,骗取受害人点击,达到获取用户敏感信息以及钱财的行为。从而提高了用户数据的安全性。
基于上述方法的实现过程,本发明实施例还提供一种文件识别装置,其 结构示意图如图4所示,所述装置包括:接收单元41,获取单元42,第一判断单元43和提示单元44,其中,
所述接收单元41,用于接收文件;
所述获取单元42,用于在所述接收单元41接收到的所述文件为图片文件时,获取所述图片文件中的二维码信息;
所述第一判断单元43,用于判断所述获取单元42获取的所述二维码信息中是否包括恶意链接;
所述提示单元44,用于在所述第一判断单元43判断所述二维码信息中包括恶意链接时,提示用户删除所述文件。
可选的,在另一实施例中,该实施例在上述实施例的基础上,所述获取单元42包括:二值化处理单元51,第二判断单元52,提取单元53和转换单元54,其结构示意图如图5所示,其中,
所述二值化处理单元51,用于对所述图片文件进行二值化处理,得到黑白二色的图片文件;
所述第二判断单元52,用于判断所述黑白二色的图片文件中是否包括二维码信息;
所述提取单元53,用于在所述第二单元判断所述黑白二色的图片文件中包括二维码信息时,提取所述二维码信息中的数据信息;
所述转换单元54,用于将所述数据信息转换为文本信息。
可选的,在另一实施例中,该实施例在上述实施例的基础上,所述第一判断单元43包括:链接判断单元61和恶意链接判断单元62,其结构示意图如图6所示,其中,
其中,所述链接判断单元61,用于判断所述转换单元转换得到的所述文本信息中是否包括链接;
所述恶意链接判断单元62,用于在所述链接判断单元61判断所述文本信息中包括链接时,继续判断所述链接是否为恶意链接。
其中,所述链接判断单元61,具体用于按照所述文本信息中是否包括统一资源定位符URL来判断所述文件信息中是否包括链接;
所述恶意链接判断单元62,具体用于按照URL黑白名单或URL评分机制判断所述文本信息中的所述URL是否为恶意链接。
可选的,在另一实施例中,该实施例在上述实施例的基础上,所述装置还包括:第三判断单元71,其结构示意图如图7所示,其中,
所述第三判断单元71,用于在所述接收单元41接收到文件时,判断所述文件是否为图片文件;
所述获取单元42,还用于在所述第三判断单元71判断所述文件为图片文件时,获取所述图片文件中的二维码信息。具体的,可以是获取单元42中的二值化处理单元51在所述第三判断单元71判断所述文件为图片文件时,获取所述图片文件中的二维码信息。
可以选的,所述装置可以集成在终端或集成在安装有IM客户端软件的终 端中,也可以独立部署,本实施例不作限制。
所述装置中各个单元的功能和作用的实现过程,详见上述方法中对应步骤的实现过程,在此不再赘述。
参见图8,为本发明实施例提供的一种终端的结构示意图,该终端800包括:处理器810、存储器820、收发器830和总线840;
处理器810、存储器820、收发器830通过总线840相互连接;总线840可以是ISA总线、PCI总线或EISA总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
存储器820,用于存放程序。具体地,程序可以包括程序代码,所述程序代码包括计算机操作指令。存储器820可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
收发器830用于连接其他设备,并与其他设备进行通信。具体的所述收发器830可以用于:接收文件;
所述处理器810用于执行存储器820中存储的所述程序代码,并在所述文件为图片文件时,获取所述图片文件中的二维码信息;判断所述二维码信息中是否包括恶意链接;
所述收发器830,还用于在所述处理器判断所述二维码信息中包括恶意链接时,提示用户删除所述文件。
可选的,所述处理器810用于获取所述图片文件中的二维码信息,包括: 对所述图片文件进行二值化处理,得到黑白二色的图片文件;判断所述黑白二色的图片文件中是否包括二维码信息;如果是,则提取所述二维码信息中的数据信息;将所述数据信息转换为文本信息。
可选的,所述处理器810用于判断所述二维码信息中是否包括恶意链接,包括:判断所述文本信息中是否包括链接;如果包括链接,继续判断所述链接是否为恶意链接。
可选的,所述处理器810用于判断所述二维码信息中是否包括链接,包括:判断所述文本信息中是否包括统一资源定位符URL来确定所述文件信息中是否包括链接;
所述处理器810用于判断所述链接是否为恶意链接,包括:判断所述文本信息中的所述URL是否在URL黑白名单来判断所述链接是否为恶意链接;或者按照URL评分机制判断所述文本信息中的所述URL是否为恶意链接。
可选的,所述处理器810用于在接收器接收到所述文件时,判断所述文件是否为图片文件,并在判断所述文件为图片文件时,获取所述图片文件中的二维码信息,判断所述二维码信息中是否包括恶意链接;如果包括恶意链接,则提示用户删除所述文件。
本发明实施例中,处理器通过对收发器接收到的图片文件进行安全检查,扫描图片文件中二维码,并且对二维码信息中的恶意的链接进行扫描和拦截,防止不法分子通过IM软件传播带有恶意链接二维码的图片文件,骗取受害人点击,达到获取用户敏感信息以及钱财的行为;从而提高了用户数据的安全性。
本领域的技术人员可以清楚地了解到本发明实施例中的技术可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本发明实施例中的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例或者实施例的某些部分所述的方法。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述的本发明实施方式,并不构成对本发明保护范围的限定。任何在本发明的精神和原则之内所作的修改、等同替换和改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种文件识别方法,其特征在于,包括:
    接收文件;
    如果所述文件为图片文件,则获取所述图片文件中的二维码信息;
    判断所述二维码信息中是否包括恶意链接;
    如果包括恶意链接,则提示用户删除所述文件。
  2. 根据权利要求1所述的方法,其特征在于,所述获取所述图片文件中的二维码信息,包括:
    对所述图片文件进行二值化处理,得到黑白二色的图片文件;
    判断所述黑白二色的图片文件中是否包括二维码信息;
    如果包括二维码信息,则提取所述二维码信息中的数据信息;
    将所述数据信息转换为文本信息。
  3. 根据权利要求2所述的方法,其特征在于,所述判断所述二维码信息中是否包括恶意链接,包括:
    判断所述文本信息中是否包括链接;
    如果包括链接,继续判断所述链接是否为恶意链接。
  4. 根据权利要求3所述的方法,其特征在于,
    所述判断所述二维码信息中是否包括链接,包括:判断所述文本信息中是否包括统一资源定位符URL来确定所述文件信息中是否包括链接;
    所述判断所述链接是否为恶意链接,包括:判断所述文本信息中的所述URL是否存在URL黑白名单来确定所述链接是否为恶意链接;或者按照URL评分机制判断所述文本信息中的所述URL是否为恶意链接。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,还包括:
    在接收文件时,判断所述文件是否为图片文件,如果为图片文件,则执行所述获取所述图片文件中的二维码信息的步骤。
  6. 一种文件识别装置,其特征在于,包括:
    接收单元,用于接收文件;
    获取单元,用于在所述接收单元接收到的所述文件为图片文件时,获取所述图片文件中的二维码信息;
    第一判断单元,用于判断所述二维码信息中是否包括恶意链接;
    提示单元,用于在所述第一判断单元判断所述二维码信息中包括恶意链接时,提示用户删除所述文件。
  7. 根据权利要求6所述的装置,其特征在于,所述获取单元包括:
    二值化处理单元,用于对所述图片文件进行二值化处理,得到黑白二色的图片文件;
    第二判断单元,用于判断所述黑白二色的图片文件中是否包括二维码信息;
    提取单元,用于在所述第二判断单元判断所述黑白二色的图片文件中包括二维码信息时,提取所述二维码信息中的数据信息;
    转换单元,用于将所述数据信息转换为文本信息。
  8. 根据权利要求7所述的装置,其特征在于,所述第一判断单元包括:
    链接判断单元,用于判断所述转换单元转换得到的所述文本信息中是否包括链接;
    恶意链接判断单元,用于在所述链接判断单元判断所述文本信息中包括链接时,继续判断所述链接是否为恶意链接。
  9. 根据权利要求8所述的装置,其特征在于,
    所述链接判断单元,具体用于判断所述文本信息中是否包括统一资源定位符URL来确定所述文件信息中是否包括链接;
    所述恶意链接判断单元,具体用于按照URL黑白名单或URL评分机制判断所述文本信息中的所述URL是否为恶意链接。
  10. 根据权利要求6至9任一项所述的装置,其特征在于,还包括:
    第三判断单元,用于在所述接收单元接收到文件时,判断所述文件是否为图片文件;
    所述获取单元,还用于在所述第三判断单元判断所述文件为图片文件时,获取所述图片文件中的二维码信息。
PCT/CN2015/094792 2014-11-26 2015-11-17 一种文件识别方法及装置 WO2016082695A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410697767.4 2014-11-26
CN201410697767.4A CN105704100A (zh) 2014-11-26 2014-11-26 一种文件识别方法及装置

Publications (1)

Publication Number Publication Date
WO2016082695A1 true WO2016082695A1 (zh) 2016-06-02

Family

ID=56073588

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/094792 WO2016082695A1 (zh) 2014-11-26 2015-11-17 一种文件识别方法及装置

Country Status (2)

Country Link
CN (1) CN105704100A (zh)
WO (1) WO2016082695A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230359728A1 (en) * 2022-05-05 2023-11-09 Bank Of America Corporation Data securement leveraging secure qr code scanner
US20230394151A1 (en) * 2022-06-07 2023-12-07 Bank Of America Corporation Protected qr code scanner using operational system override
US12008105B2 (en) * 2022-06-07 2024-06-11 Bank Of America Corporation Protected QR code scanner using operational system override

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090391B (zh) * 2017-12-29 2021-03-09 北京奇虎科技有限公司 二维码的识别方法及装置
CN112732895B (zh) * 2018-03-26 2024-01-19 广州虎牙信息科技有限公司 审核文本的方法、装置、电子设备和存储介质
CN116861412A (zh) * 2023-06-26 2023-10-10 深圳市赛凌伟业科技有限公司 一种基于大数据的信息安全分析方法和系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819723A (zh) * 2011-12-26 2012-12-12 哈尔滨安天科技股份有限公司 一种恶意二维码检测方法和系统
CN103226688A (zh) * 2013-02-28 2013-07-31 中国地质大学(武汉) 一种二维码防篡改和防伪造的认证方法
CN103295046A (zh) * 2013-06-13 2013-09-11 北京网秦天下科技有限公司 生成和使用安全二维码的方法和设备
CN103647779A (zh) * 2013-12-16 2014-03-19 北京奇虎科技有限公司 一种通过二维码检测钓鱼欺诈信息的方法及装置
CN104052722A (zh) * 2013-03-15 2014-09-17 腾讯科技(深圳)有限公司 网址安全性检测的方法、装置及系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4918174B1 (ja) * 2011-09-20 2012-04-18 株式会社Pijin 情報提供装置、情報提供方法、及びコンピュータプログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819723A (zh) * 2011-12-26 2012-12-12 哈尔滨安天科技股份有限公司 一种恶意二维码检测方法和系统
CN103226688A (zh) * 2013-02-28 2013-07-31 中国地质大学(武汉) 一种二维码防篡改和防伪造的认证方法
CN104052722A (zh) * 2013-03-15 2014-09-17 腾讯科技(深圳)有限公司 网址安全性检测的方法、装置及系统
CN103295046A (zh) * 2013-06-13 2013-09-11 北京网秦天下科技有限公司 生成和使用安全二维码的方法和设备
CN103647779A (zh) * 2013-12-16 2014-03-19 北京奇虎科技有限公司 一种通过二维码检测钓鱼欺诈信息的方法及装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230359728A1 (en) * 2022-05-05 2023-11-09 Bank Of America Corporation Data securement leveraging secure qr code scanner
US20230394151A1 (en) * 2022-06-07 2023-12-07 Bank Of America Corporation Protected qr code scanner using operational system override
US12008105B2 (en) * 2022-06-07 2024-06-11 Bank Of America Corporation Protected QR code scanner using operational system override

Also Published As

Publication number Publication date
CN105704100A (zh) 2016-06-22

Similar Documents

Publication Publication Date Title
US10601865B1 (en) Detection of credential spearphishing attacks using email analysis
US11330014B2 (en) Optically analyzing text strings such as domain names
US7870201B2 (en) Apparatus for executing an application function using a mail link and methods therefor
US9438575B2 (en) Smart phone login using QR code
US7870202B2 (en) Apparatus for executing an application function using a smart card and methods therefor
JP2021504860A (ja) トランザクション確認及び暗号通貨のためのセキュアな鍵記憶装置の拡張
JP2021510978A (ja) 検証可能なクレームをバインドするシステム及び方法
US10516567B2 (en) Identification of vulnerability to social phishing
US11165793B2 (en) Method and system for detecting credential stealing attacks
WO2016082695A1 (zh) 一种文件识别方法及装置
US11252176B2 (en) Optimal scanning parameters computation methods, devices and systems for malicious URL detection
KR20120037330A (ko) 이미지객체를 이용한 로그인 인증 방법 및 그 시스템
US8510817B1 (en) Two-factor anti-phishing authentication systems and methods
US10798068B2 (en) Wireless information passing and authentication
US20190356636A1 (en) Secure Message Inoculation
KR101940310B1 (ko) 웹 사이트 검증 장치 및 그 방법
Varshney et al. Push notification based login using BLE devices
JP6754971B2 (ja) 偽ウェブページ判別装置、偽ウェブページ判別システム、偽ウェブページ判別方法及び偽ウェブページ判別プログラム
EP3350973B1 (fr) Procédé d'authentification de site de la toile et de sécurisation d'accès à un site de la toile
Dhavale Advanced image-based spam detection and filtering techniques
US8503636B1 (en) Systems and methods for blocking an outgoing request associated with an outgoing telephone number
US20210064662A1 (en) Data collection system for effectively processing big data
KR102324825B1 (ko) 인증 처리를 위한 서버, 시스템 및 그 제어방법
US20240171609A1 (en) Generating a content signature of a textual communication using optical character recognition and text processing
US20210234891A1 (en) Artificial intelligence (ai) powered conversational system for identifying malicious messages

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15863901

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15863901

Country of ref document: EP

Kind code of ref document: A1