CN112214653A - Character string recognition method and device, storage medium and electronic equipment - Google Patents

Character string recognition method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112214653A
CN112214653A CN202011179927.8A CN202011179927A CN112214653A CN 112214653 A CN112214653 A CN 112214653A CN 202011179927 A CN202011179927 A CN 202011179927A CN 112214653 A CN112214653 A CN 112214653A
Authority
CN
China
Prior art keywords
character string
character
ciphertext
hidden
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011179927.8A
Other languages
Chinese (zh)
Other versions
CN112214653B (en
Inventor
赵陶明
向波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202011179927.8A priority Critical patent/CN112214653B/en
Publication of CN112214653A publication Critical patent/CN112214653A/en
Application granted granted Critical
Publication of CN112214653B publication Critical patent/CN112214653B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Character Discrimination (AREA)

Abstract

The embodiment of the application discloses a character string identification method, a character string identification device, a storage medium and electronic equipment, wherein the method comprises the following steps: the method comprises the steps of obtaining a decompilated file of a detection object, obtaining a first character string set corresponding to the decompilated file, carrying out ciphertext character recognition on each character string in the first character string set to obtain a ciphertext character string set, and carrying out hidden character recognition on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string. By adopting the embodiment of the application, the accuracy of character recognition can be improved.

Description

Character string recognition method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for character string recognition, a storage medium, and an electronic device.
Background
With the development of communication technology, application scenarios based on terminals (such as mobile phones, computers and the like) are more and more abundant. In the use process of the terminal, a character detection and identification process of a detection object such as an application program, an audio file and the like can be involved.
In the actual character detection and identification process, a detection object (such as a detection application) is reversed by a malicious attacker in object operation scenes such as terminal downloading, transmission, forwarding and sharing, and some characters with risks are hidden in a source file when the detection object (such as an application program) is developed.
Disclosure of Invention
The embodiment of the application provides a character string identification method, a character string identification device, a storage medium and electronic equipment, and the character identification accuracy can be improved. The technical scheme of the embodiment of the application is as follows:
in a first aspect, an embodiment of the present application provides a character string recognition method, where the method includes:
acquiring a decompiled file of a detection object, and acquiring a first character string set corresponding to the decompiled file;
performing ciphertext character recognition on each character string in the first character string set to obtain a ciphertext character string set;
and carrying out hidden character recognition on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string.
In a second aspect, an embodiment of the present application provides a character string recognition apparatus, where the apparatus includes:
the character string set acquisition module is used for acquiring a decompiled file of a detection object and acquiring a first character string set corresponding to the decompiled file;
the ciphertext character recognition module is used for performing ciphertext character recognition on each character string in the first character string set to obtain a ciphertext character string set;
and the hidden character recognition module is used for carrying out hidden character recognition on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string.
In a third aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.
In a fourth aspect, an embodiment of the present application provides an electronic device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
The beneficial effects brought by the technical scheme provided by some embodiments of the application at least comprise:
in one or more embodiments of the present application, a terminal obtains a decompiled file of a detection object, obtains a first string set corresponding to the decompiled file, performs ciphertext character recognition on each string in the first string set to obtain a ciphertext string set, and performs hidden character recognition on each ciphertext string in the ciphertext string set to obtain at least one hidden string. The ciphertext character string set with hidden characters is screened out through ciphertext recognition of the decompiled file of the detection object, then the hidden characters are recognized on the ciphertext character string, the problem that accuracy is not high during character string recognition can be solved, accuracy of character recognition is improved, meanwhile, ciphertext recognition of all character strings contained in the decompiled file is not needed to be carried out one by one, only hidden character recognition is carried out on screened ciphertext characters, and efficiency during character recognition is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a character string recognition method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of another character string recognition method provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a character string recognition apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a string set obtaining module according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a ciphertext character recognition module according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a ciphertext character determining unit according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a hidden character recognition module according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of another character string recognition apparatus provided in the embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 10 is a schematic structural diagram of an operating system and a user space provided in an embodiment of the present application;
FIG. 11 is an architectural diagram of the android operating system of FIG. 9;
FIG. 12 is an architecture diagram of the IOS operating system of FIG. 9.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description of the present application, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present application, it is noted that, unless explicitly stated or limited otherwise, "including" and "having" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art. Further, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The present application will be described in detail with reference to specific examples.
In one embodiment, as shown in fig. 1, a character string recognition method is specifically proposed, which can be implemented by means of a computer program and can be run on a character string recognition device based on the von neumann architecture. The computer program may be integrated into the application or may run as a separate tool-like application.
The character string recognition device may be a terminal, and the terminal may be an electronic device with a character string recognition function, including but not limited to: wearable devices, handheld devices, personal computers, tablet computers, in-vehicle devices, smart phones, computing devices or other processing devices connected to a wireless modem, and the like. The terminal devices in different networks may be called different names, for example: user equipment, access terminal, subscriber unit, subscriber station, mobile station, remote terminal, mobile device, user terminal, wireless communication device, user agent or user equipment, cellular telephone, cordless telephone, Personal Digital Assistant (PDA), terminal equipment in a 5G network or future evolution network, and the like.
In the related art, usually, a detection object (such as a detection application) is decompiled to obtain a decompiled file, and then characters of the decompiled file are analyzed and identified one by one in a manual mode, so that the character identification efficiency is low; in addition, the character analysis and recognition of the decompiled file is often performed by directly recognizing the hidden character of the decompiled file by adopting a regular matching method, but the method is easy to have a problem of low accuracy in character recognition, for example, an English character string is directly recognized as the hidden character.
Specifically, the character string recognition method includes:
step S101: and acquiring a decompiled file of the detection object, and acquiring a first character string set corresponding to the decompiled file.
The detection object in this embodiment focuses on an object related to user privacy or having a security risk, and in some embodiments, the type of the detection object may be any type of data to be detected, such as an application program, a multimedia file, cache data, and the like. It should be noted that the type of the detection object is determined based on the actual application environment, and is not limited in the embodiment of the present application.
In practical application, there may be a situation that a detection object (such as a detection application) is reversed by a malicious attacker in an object operation scene, such as downloading, transmitting, forwarding, sharing, or the like, or some codes or characters having risks when the detection object is developed are hidden in a source file, when a terminal is applied to the detection object, there may be a risk that is difficult to predict due to the hidden codes, thereby causing a safety hazard, the hidden character string in the detection object is identified by executing the character string identification method of the present application, so as to reduce the risks caused by the hidden character string, and specific implementation steps may refer to related definitions of the following implementation steps.
The decompilated file can be understood as a decompilated file obtained by reversely compiling the data of the detection object through reverse engineering. In some embodiments, the reverse compiling process of the detection object means that a compiled file corresponding to the detection object obtains a corresponding uncompiled file, taking the detection object as a common detection application as an example, the detection application is reversely compiled through reverse engineering, so that a decompiled file can be obtained, and specifically, the decompiled file may be a smali code file, each xml resource file, an android manifest.
In a specific embodiment, the detection object may include a detection application, and when the terminal operates the detection object-detection application, in order to reduce an existing operation risk, the detection application may be reversely compiled in advance, and then a decompiled file of the detection application may be obtained based on a result of the reverse compilation.
Optionally, in an actual application, the detection application acquired by the terminal is usually an application installation package, and when the application installation package of the detection application is reversely compiled, the application installation package can be unpacked in advance and then reversely compiled, and the application installation package is an Android application package (Android), for example, the application installation package is defined as an Android application package, and the application installation package is usually a file in an APK file format, ". APK file format" can be understood as a compressed file in a ZIP format; the terminal may unpack the application installation package, and the unpacking process may be to obtain classes.dex (a compiled code file), resources.arsc (a compiled resource file), and android _ xml (a compiled layout file) from the application installation package file, and then perform decompiling on the compiled files (such as a compiled code file, a compiled resource file, and a compiled layout file) based on a reverse compiling tool in reverse engineering, so as to obtain an "uncompiled file" before compiling of the "compiled file," that is, a smali code file, each xml resource file, and an android _ xml layout file, and the like, where in some application scenarios, a hidden character string (which may be understood as a code) exists in the "smali code file. Further, the decompiled file that is typically retrieved is a "smali code file" named for a plurality of different files.
Specifically, after acquiring a decompiled file of a detection object (such as a detection application), the terminal acquires a first character string set corresponding to the decompiled file and including at least one character string
The first character string set comprises at least one character string, the code function and the code type of each character string are inconsistent, for example, some character strings include but are not limited to system function call, data access, http request, system interface call and identification code acquisition, and the character strings can have a character hiding condition, that is, a person skilled in the art cannot recognize or distinguish the function of the character string based on the hidden character string. In addition, various types of data may exist in the decompiled file, for example, some data may be constants, some data may be characters, some data may be classes (e.g., a certain function method class), some data may be variables, and the like. In practical applications, the character string is usually hidden, for example, the character string corresponding to the function call name is hidden, the character string corresponding to the system interface call name is hidden, and the like. Based on this, in this embodiment, after the terminal acquires the decompiled file, the terminal may acquire all the character strings included in the decompiled file, which are also the first character string set, in a manner of traversing the line-by-line codes in the decompiled file.
Step S102: and carrying out ciphertext character recognition on each character string in the first character string set to obtain a ciphertext character string set.
In practical application, when a character string in a certain file is hidden, a corresponding encryption mode is usually adopted to encrypt the character string to be hidden, so that an encrypted ciphertext character string is generated. The ciphertext character string is usually difficult to be directly identified by a conventional character matching mode (such as regular matching) in an electronic device (such as a terminal) for file detection.
The ciphertext character string includes a ciphertext character string determined after each character string in the first character string set is subjected to ciphertext identification processing, and it can be understood that the first character string set at least includes a ciphertext character string which is encrypted and hidden and an original character string which is not encrypted and hidden.
Specifically, the terminal may obtain a static feature of each character string, where the static feature may be a character structure feature, a character length feature, and a character application type feature of the character string (for example, the character is used for function call, script call, and the like), and then determine whether the character string is a ciphertext character string of a ciphertext type according to the static feature.
The characteristic judgment mode may be that the terminal stores a ciphertext characteristic library in advance, and the ciphertext characteristic library includes ciphertext characteristics corresponding to at least one ciphertext character string. The terminal can be used for directly facing each character string during judgment, matching the static characteristics of the character string with the ciphertext characteristics in the ciphertext characteristic library, and determining the character string as the ciphertext character string if the ciphertext characteristics which are consistent with the static characteristics are matched, wherein the consistent standard can be that the similarity between the static characteristics and any ciphertext characteristic is calculated, if the similarity is greater than a set similarity threshold value, the character string is considered to be consistent, otherwise, the character string is not consistent.
Optionally, the ciphertext recognition model may be trained in advance, wherein the ciphertext recognition model may be created by obtaining sample data including a large number of character strings in the actual application environment, extracting feature information (that is, static features of the character strings) in advance, and performing ciphertext labeling on the sample data. The ciphertext recognition model may be trained by using a large number of samples, for example, the ciphertext recognition model may be implemented based on at least one of a Convolutional Neural Network (CNN) model, a Deep Neural Network (DNN) model, a Recurrent Neural Network (RNN), a model, an embedding (embedding) model, a Gradient Boosting Decision Tree (GBDT) model, and a Logistic Regression (LR) model, and the trained ciphertext recognition model may be obtained by training the ciphertext recognition model based on the labeled sample data.
Furthermore, in this embodiment, an initial ciphertext recognition model is created by using a DNN-HMM model introducing an error back propagation algorithm, after extracting feature information, the feature information is input into the neural network model in the form of a feature vector, a training process of the neural network model generally consists of a forward propagation process and a back propagation process, in the forward propagation process, feature information corresponding to terminal input sample data is transmitted to an output layer after being calculated by a transfer function (also called an activation function or a conversion function) of hidden layer neurons (also called nodes) from an input layer of the neural network model, wherein each layer of neuron state affects a next layer of neuron state, an actual output value-first speech identifier is calculated at the output layer, an expected error between the actual output value and an expected output value is calculated, parameters of the neural network model are adjusted based on the expected error, the parameters comprise the weight value and the threshold value of each layer, and after training is completed, a ciphertext recognition model is generated.
After the training is completed, in practical application, the terminal can directly input each character string in the first character string set to the ciphertext recognition model for ciphertext character recognition, and output a ciphertext character string set including at least one ciphertext character string.
Step S103: and carrying out hidden character recognition on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string.
Specifically, after the terminal obtains the ciphertext character string set, hidden character recognition needs to be performed on each ciphertext character string in the ciphertext character string set, and the recognition aims to restore the ciphertext character string into a hidden character string, so that further character risk assessment is performed on the hidden character string conveniently.
Further, in practical application, the hidden character string is encrypted to generate a ciphertext character string, encryption modes based on fixed characters are used for encryption, for example, the encryption mode of BASE64 is used for quickly encrypting and hiding the hidden character string, in the embodiment of the application, a regular matching mode can be used for carrying out hidden character recognition processing on the ciphertext character string, and a common regular expression which is constructed in advance based on the encryption mode of BASE64 is used for matching and searching, so that the hidden character string corresponding to each ciphertext character string can be obtained.
In the embodiment of the application, a terminal acquires a decompiled file of a detection object, acquires a first character string set corresponding to the decompiled file, performs ciphertext character recognition on each character string in the first character string set to obtain a ciphertext character string set, and performs hidden character recognition on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string. The ciphertext character string set with hidden characters is screened out through ciphertext recognition of the decompiled file of the detection object, then the hidden characters are recognized on the ciphertext character string, the problem that accuracy is not high during character string recognition can be solved, accuracy of character recognition is improved, meanwhile, ciphertext recognition of all character strings contained in the decompiled file is not needed to be carried out one by one, only hidden character recognition is carried out on screened ciphertext characters, and efficiency during character recognition is improved.
Referring to fig. 2, fig. 2 is a schematic flowchart of another embodiment of a character string recognition method according to the present application. Specifically, the method comprises the following steps:
step S201: and acquiring a decompiled file of the detection object, and determining a reference byte of the target type and a character offset value corresponding to the reference byte in the decompiled file.
The reference byte may be understood as an identification bit corresponding to a position where a character string is located in a section of code including the character string, and may also be understood as a basic byte code when the character string is included in the section of code in some embodiments, for example, the basic byte code may often be const, const-string, const-wide, and the like.
For example, in a decompiled file, there are a large number of code fragments, which are explicated by the following list of reference code fragments, as follows:
such as: const-string vmaa, string @ bbbb.
In the above code, the reference byte, i.e., the const-string, the character string corresponding to the code segment, i.e., the character string index after the reference byte, can be determined.
Such as: const-string/jumbo vmaa, string @ bbbbbbbbbb.
In the above code, the reference byte, i.e., the const-string, the character string corresponding to the code segment, i.e., the character string constructed by the character string index after speaking the reference byte, can be determined.
Further, the string of codes corresponding to the reference byte may correspond to a character offset value, such as "V0..", "v1.", and an address of a next character string may be determined based on the current character offset value, such as a starting address corresponding to a next register may be determined based on the character offset value.
Step S202: and acquiring a first character string set containing at least one character string in the compiled file based on the reference byte and the character deviation value.
E.g., const-string vmaa, string @ bbbb.
In the above code, the reference byte is also referred to as the const-string, then the character string corresponding to the segment of code can be determined (which can be understood as the character string corresponding to the character string index) according to the character string index after the reference byte, and after the character string indicated by the segment of code is obtained, the address of the next character string to be taken is determined based on the character offset value, such as "V0..", "v1.", in the string of code corresponding to the reference byte, and the start address corresponding to the next register can be determined based on the character offset value, for example. And then determining the position of the reference byte from a section of code from the starting address, then acquiring the indicated character string behind the reference byte in the section of code, and then acquiring the character offset value.
Step S203: and determining character characteristics corresponding to each character string in the first character string set.
The character features include, but are not limited to: each character, the length of the character string, the number of lower case, the number of upper case, and the number corresponding to the characters in the character string except for the lower case character and the upper case character.
Step S204: and performing ciphertext character recognition based on the character features corresponding to each character string, and determining a ciphertext character string set.
Specifically, the terminal may perform character feature analysis on each character string in the first character string set to determine the character features corresponding to the character strings, such as determining each character in the character strings, the length of the character strings, the number of lower case (characters), the number of upper case (characters), and the like. Then, ciphertext character recognition is performed on each character string based on the character features of the character string, the purpose of the character recognition is to determine whether the character string is a ciphertext character string, and a specific method for judging the ciphertext character string will be shown below, as follows:
step S2041: and when the lower case character quantity of a first character string in each character string is larger than a first threshold value and the target character quantity of the first character string is smaller than a second threshold value, determining that the first character string is a ciphertext character string in a ciphertext character string set.
The first threshold and the second threshold are associated with a total number of characters of each character string, and the target number of characters is a number corresponding to characters of each character string except for the lower case characters and the upper case characters;
the threshold value may be understood as a threshold value of a certain field, state or system, also called a critical value. In this embodiment, the first threshold may be understood as a threshold or a critical value of the number of lower case characters, and the second threshold is a threshold or a critical value of the number of target characters;
in this embodiment, whether a certain character string is a ciphertext character string is determined based on a first threshold and a second threshold, and it is generally considered that when the number of lower case characters is greater than the first threshold, and the target number of characters, that is, the number corresponding to the characters of each character string except the lower case characters and the upper case characters, is also less than the second threshold, the character string (that is, the first character string) is considered as the ciphertext character string.
Specifically, after determining the character characteristics corresponding to each character string, the terminal determines each character string, that is, determines whether the lower case character of the character string is greater than a first threshold, and determines that the number of target characters (the number of target characters is the number corresponding to the characters of each character string except the lower case character and the upper case character) is less than a second threshold, and if a first character string satisfying the determination condition exists in each character string, determines that the first character string is a ciphertext character string in a ciphertext character string set.
In a possible implementation manner, the above-mentioned decision process of the ciphertext character string may be represented by the following decision formula, that is:
f(string,count,other,low)=1,low>a*count&&other<b*count
wherein "&" represents the logical operator "and", string represents the character string, low represents the number of lower case characters of the character string, up represents the number of upper case characters of the character string, and other represents the number of target characters (i.e., "the number of each of the character strings corresponding to the characters other than the lower case characters and the upper case characters"), and count represents the total number of characters corresponding to the character string. a represents a first weight value of the total number of characters, and the "a count" is a first threshold; b represents a second weight value of the total number of characters, and the second threshold is b count;
the first threshold and the second threshold are determined based on actual application conditions, and in a determination mode, a large amount of sample data belonging to the same type as a detection object can be collected in advance, ciphertext character strings corresponding to each sample data are analyzed and compared one by one, the first threshold and the second threshold are determined by a mathematical statistics method, and in specific application, the determination is realized by determining a first weight value corresponding to the first threshold and determining a second weight value corresponding to the second threshold.
Step S2042: when the lower case character quantity of a second character string in each character string is smaller than the first threshold value and the reference character quantity of the second character string is smaller than a third threshold value, determining that the second character string is the ciphertext character string in the ciphertext character string set.
The reference character number is a difference between the target character number and the capital character number, and the third threshold is associated with a total number of characters of each of the character strings.
In this embodiment, whether a certain character string is a ciphertext character string is determined based on a first threshold and a third threshold, and it is generally considered that when the number of lower-case characters is smaller than the third threshold, and the number of reference characters, that is, the "difference between the number of target characters and the number of upper-case characters" is also smaller than the third threshold, the character string (that is, the second character string) is considered to be the ciphertext character string.
Specifically, after determining the character features corresponding to each character string, the terminal determines each character string, that is, determines whether the lower case character of the character string is smaller than a first threshold and whether the number of the reference characters is smaller than a third threshold, and if a second character string meeting the determination condition exists in each character string, determines that the second character string is a ciphertext character string in a ciphertext character string set.
In a possible implementation manner, the above-mentioned decision process of the ciphertext character string may be represented by the following decision formula, that is:
f(string,count,other,low)=1,low<a*count&&(other-up)<c*count
where string represents a character string, low represents the number of lower case characters of the character string, up represents the number of upper case characters of the character string, other represents the number of target characters (that is, "the number of characters of each character string except for the lower case characters and the upper case characters" corresponds), and count represents the total number of characters corresponding to the character string. "other-up" is the number of reference characters
a represents a first weight value of the total number of characters, and the "a count" is a first threshold; c represents a second weight value of the total number of characters, "c count", i.e. a second threshold.
Step S2043: and when the character string belongs to a common character string in a common language dictionary, determining that the character string is not the ciphertext character string.
The common language dictionary is preset, and in an actual application environment corresponding to the detection object, the common characters can be in accordance with the judgment conditions but not the ciphertext characters, such as corresponding character strings of commercial names, geographic signs, common phrases and the like.
In addition, step S2043 is not sequentially executed with step S2041 and step S2042, and step S2043 may be executed before step S2041 and step S2042, or may be executed after step S2041 and step S2042.
Step S205: and extracting interface calling characteristics in the decompiled file, and determining the encryption mode of the detection object based on the interface calling characteristics.
The interface calling feature can be understood as that the decompilated file calls a corresponding encryption calling method, an interface (software interface) calling class and the like when being compiled and encrypted, and the encryption mode adopted by the hidden character string during encryption can be determined by analyzing and processing the interface (software interface) calling feature called when being encrypted in the decompilated file.
In a possible implementation manner, the interface calling feature may be a software call interface class, for example, an interface call class for code encryption corresponding to a corresponding programming language (e.g., JAVA language) may be used when the decompilated file is compiled, and in the JAVA language, the interface call class may be a cipher class, and in implementation, the terminal may extract the software call interface class for code encryption from the decompilated file, for example, a cipher. One possible implementation manner may be that the terminal obtains an interface call class in the decompiled file, and then may determine an encryption manner corresponding to the class object based on the class object in the software call interface class. When the hidden character string is encrypted, the detection object calls an interface class to create a class object by calling software, and the class object contains the name of an encryption algorithm (for example, DES).
For example, the Cipher class in java, which is located in java script package and is declared as public class Cipher extensions Object, provides cryptographic functions for encryption and decryption. It forms the core of the Java Cryptographic Extension (JCE) framework. To create a Cipher object, encryption calls Cipher's getInstance method and passes to it the name that requested the transformation. The name of the provider (optional) may also be specified. A conversion is a string of characters that describes an operation (or set of operations) performed on a given input to produce some output. The conversion always includes the name of the encryption algorithm (e.g., DES), possibly followed by a feedback mode and padding scheme. Based on the encryption algorithm name in the class object (i.e. the Cipher object), the terminal can determine which encryption mode is used by the detection object.
Step S206: and determining at least one hidden character string in each ciphertext character string of the ciphertext character string set according to a decryption mode corresponding to the encryption mode.
Specifically, after the encryption mode adopted by the detection object is determined, a corresponding decryption mode can be adopted, and if the encryption mode can be an RSA encryption algorithm, a corresponding private key can be obtained to decrypt the ciphertext character string, so as to obtain at least one hidden character string; if the encryption method may be an encryption method based on BASE64, the ciphertext character string may be decrypted in a decryption method corresponding to the encryption method according to a regular matching decryption method.
In a specific implementation scenario, if the encryption scheme is a fixed character-based target encryption scheme, that is, a coding scheme based on BASE64, and fixed 64 characters are used to represent any binary data, the decryption scheme corresponding to the target encryption scheme is: performing regular matching processing, namely first regular matching processing, on the encrypted text, specifically performing first regular matching processing on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string;
in addition, in practical applications, when the hidden string is encrypted and hidden by the encryption method of BASE64, the hidden string is usually complemented, for example, the byte character is specified before or after the hidden string, and then encrypted and encoded by the encoding method of BASE 64.
Illustratively, during encryption, something is added before and after an original character string, then Base64 encoding is carried out, and known software such as 'some thunder download software' respectively adds AA and ZZ before and after an address; adding [ FLASHGET ] before and after the address of a certain Flashget file;
the first regular matching in this embodiment is to restore the reference character string corresponding to the ciphertext character string by means of regular query matching, and aims to remove the padding character from the reference character string, and finally obtain the hidden character string.
The following are exemplary:
1. Z2V0aW1laQ ═ ciphertext character string after Base64 encryption
The hidden character string corresponding to the decrypted hidden character string is getimei by adopting a first regular matching process;
2. aHR0cHM6Ly93d3 cumFpZHUuY 29tLw ═ cipher text string after Base64 encryption
The hidden character string is subjected to first regular matching processing, and the decrypted corresponding hidden character string is https:// www.baidu.com/;
3. MTI3LjAuMC4 as encrypted ciphertext string of Base64
The hidden character string corresponding to the decrypted character string is 127.0.0.1 by adopting a first regular matching process;
step S207: determining a character application type of each of the hidden character strings.
One way to determine the character application type may be: performing second regular matching processing on each hidden character string, and determining a first character application type corresponding to the hidden character string; and/or the presence of a gas in the gas,
the second regular matching aims at performing regular feature judgment on the hidden character string based on a common regular expression, so that whether the character string is a first character application type corresponding to a domain name, a network address and the like can be identified and determined according to the regular features, specifically, whether the regular features are consistent with the character features corresponding to the first character application type is judged, if the network address contains a specific character such as http, and if the domain name contains the specific character such as 123.0.0.1, character features similar to 'number + decimal point' are judged, that is, the first character application type is wrapped in multiple types, and the character features can be a domain name application subtype, a website address application subtype and the like. The first character application type, such as a domain name application subtype, corresponding to the hidden character string can be judged based on the common regular expression.
The common regular expression corresponding to the second regular matching processing is determined based on the actual application environment, which is not limited here.
If the regular expression is commonly used, the Email address can be judged as follows: "@ \ w + ([ - + ] \ w +)" @ \ w + ([ - ] \ w +) "\\ \ w + ([ - ] \ w +)";
judging the domain name as a common regular expression can be: "[ a-zA-Z0-9] [ -a-zA-Z0-9] {0,62} (/ [ a-zA-Z0-9] [ -a-zA-Z0-9] {0,62}) +/? "
The subnet mask can be judged as a regular expression as commonly used ((.
One way to determine the character application type may be: and performing common character matching processing on each hidden character string and a preset interface character dictionary, and determining a second character application type corresponding to the hidden character string.
The interface character dictionary can be understood as including common system interface calling names, such as sensitive interfaces gettei, getMacAddress and the like, and the purpose of performing character matching on the hidden character string and the interface character dictionary is to determine whether the hidden character string calls a corresponding sensitive interface, for example, a software interface related to user privacy.
In the embodiment of the application, a decompilated file of a detection object is obtained, a first character string set corresponding to the decompilated file is obtained, ciphertext character recognition is performed on each character string in the first character string set to obtain a ciphertext character string set, and hidden character recognition is performed on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string. The ciphertext character string set with hidden characters is screened out through ciphertext recognition of the decompiled file of the detection object, then the hidden characters are recognized on the ciphertext character string, the problem that accuracy is not high during character string recognition can be solved, accuracy of character recognition is improved, meanwhile, ciphertext recognition of all character strings contained in the decompiled file is not needed to be carried out one by one, only hidden character recognition is carried out on screened ciphertext characters, and efficiency during character recognition is improved.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 3, a schematic structural diagram of a character string recognition apparatus according to an exemplary embodiment of the present application is shown. The string recognition means may be implemented as all or part of the apparatus in software, hardware or a combination of both. The device 1 comprises a character string set acquisition module 11, a ciphertext character recognition module 12 and a hidden character recognition module 13.
The character string set acquisition module 11 is configured to acquire a decompiled file of a detection object, and acquire a first character string set corresponding to the decompiled file;
a ciphertext character recognition module 12, configured to perform ciphertext character recognition on each character string in the first character string set to obtain a ciphertext character string set;
and the hidden character recognition module 13 is configured to perform hidden character recognition on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string.
Optionally, the detection object includes a detection application, and the character string set obtaining module 11 is specifically configured to:
and reversely compiling the detection application to obtain a decompiled file of the detection application.
Optionally, as shown in fig. 4, the character string set obtaining module 11 includes:
a byte information determining unit 111, configured to determine, in the decompiled file, a reference byte of a target type and a character offset value corresponding to the reference byte;
a string set obtaining unit 112, configured to obtain, based on the reference byte and the character offset value, a first string set including at least one string in the compiled file.
Optionally, as shown in fig. 5, the ciphertext character recognition module 12 includes:
a character feature determining unit 121, configured to determine a character feature corresponding to each character string in the first character string set;
and the ciphertext character determining unit 122 is configured to perform ciphertext character recognition based on the character features corresponding to each character string, and determine a ciphertext character string set.
Optionally, as shown in fig. 6, the ciphertext character determining unit 122 includes:
a first ciphertext determining subunit 1221, configured to determine, when a lower case character amount of a first character string in each character string is greater than a first threshold and a target character amount of the first character string is less than a second threshold, that the first character string is a ciphertext character string in a ciphertext character string set, where the first threshold and the second threshold are associated with a total character amount of each character string, and the target character amount is a number corresponding to a character of each character string except for the lower case character and the upper case character;
a second ciphertext determining subunit 1222, configured to determine that the second character string is the ciphertext character string in the ciphertext character string set when the lower case character amount of the second character string is smaller than the first threshold and the reference character number of the second character string is smaller than a third threshold, where the reference character number is a difference between the target character number and the upper case character number, and the third threshold is associated with a total character amount of each character string.
Optionally, as shown in fig. 6, the ciphertext character determining unit 122 further includes:
a third ciphertext determining subunit 1223, configured to determine that the character string is not the ciphertext character string when the character string belongs to a common character string in a common language dictionary.
Optionally, as shown in fig. 7, the hidden character recognition module 13 includes:
an encryption mode determining unit 131, configured to extract an interface calling feature in the decompiled file, and determine an encryption mode of the detection object based on the interface calling feature;
a hidden character string determining unit 132, configured to determine at least one hidden character string in each ciphertext character string of the ciphertext character string set according to the decryption manner corresponding to the encryption manner.
Optionally, the hidden character string determining unit 132 is specifically configured to:
and if the encryption mode is a fixed character-based target encryption mode, performing first regular matching processing on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string.
Optionally, as shown in fig. 8, the apparatus 1 further includes:
a character application type determining module 14, configured to determine a character application type of each hidden character string.
Optionally, the character application type determining module 14 is specifically configured to:
performing second regular matching processing on each hidden character string, and determining a first character application type corresponding to the hidden character string; and/or the presence of a gas in the gas,
and performing common character matching processing on each hidden character string and a preset interface character dictionary, and determining a second character application type corresponding to the hidden character string.
It should be noted that, when the character string recognition apparatus provided in the foregoing embodiment executes the character string recognition method, only the division of the above function modules is taken as an example, and in practical applications, the above functions may be distributed to different function modules according to needs, that is, the internal structure of the device may be divided into different function modules to complete all or part of the above described functions. In addition, the character string recognition device and the character string recognition method provided by the above embodiments belong to the same concept, and the details of the implementation process are referred to in the method embodiments, which are not described herein again.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the embodiment of the application, a terminal acquires a decompiled file of a detection object, acquires a first character string set corresponding to the decompiled file, performs ciphertext character recognition on each character string in the first character string set to obtain a ciphertext character string set, and performs hidden character recognition on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string. The ciphertext character string set with hidden characters possibly is screened out by carrying out ciphertext recognition on the decompiled file of the detection object, and then the hidden characters are recognized on the ciphertext character string, so that the problem of low accuracy in character string recognition can be solved, the accuracy of character recognition is improved, all character strings contained in the decompiled file do not need to be subjected to ciphertext recognition one by one, only the screened ciphertext characters need to be subjected to hidden character recognition, and the efficiency in character recognition is improved; the whole character recognition of the detection object can be carried out without a manual audit mode, the hidden character recognition process can be carried out automatically, and the automation degree of character string recognition is improved; in the character recognition process, ciphertext recognition can be performed based on character features (such as the number of capital and small cases) of character strings in the decompiled file so as to preliminarily determine a ciphertext character string set with hidden characters possibly existing, so that the character recognition means is enriched, and the hidden character strings can be conveniently and rapidly recognized.
An embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the character string identification method according to the embodiment shown in fig. 1 to fig. 2, and a specific execution process may refer to specific descriptions of the embodiment shown in fig. 1 to fig. 2, which is not described herein again.
The present application further provides a computer program product, where at least one instruction is stored, and the at least one instruction is loaded by the processor and executes the character string identification method according to the embodiment shown in fig. 1 to fig. 2, where a specific execution process may refer to specific descriptions of the embodiment shown in fig. 1 to fig. 2, and is not described herein again.
Referring to fig. 9, a block diagram of an electronic device according to an exemplary embodiment of the present application is shown. The electronic device in the present application may comprise one or more of the following components: a processor 110, a memory 120, an input device 130, an output device 140, and a bus 150. The processor 110, memory 120, input device 130, and output device 140 may be connected by a bus 150.
Processor 110 may include one or more processing cores. The processor 110 connects various parts within the overall electronic device using various interfaces and lines, and performs various functions of the electronic device 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120 and calling data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), field-programmable gate Array (FPGA), and Programmable Logic Array (PLA). The processor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 110, but may be implemented by a communication chip.
The Memory 120 may include a Random Access Memory (RAM) or a read-only Memory (ROM). Optionally, the memory 120 includes a non-transitory computer-readable medium. The memory 120 may be used to store instructions, programs, code sets, or instruction sets. The memory 120 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like, and the operating system may be an Android (Android) system, including a system based on Android system depth development, an IOS system developed by apple, including a system based on IOS system depth development, or other systems. The data storage area may also store data created by the electronic device during use, such as phone books, audio and video data, chat log data, and the like.
Referring to fig. 10, the memory 120 may be divided into an operating system space, where an operating system is run, and a user space, where native and third-party applications are run. In order to ensure that different third-party application programs can achieve a better operation effect, the operating system allocates corresponding system resources for the different third-party application programs. However, the requirements of different application scenarios in the same third-party application program on system resources are different, for example, in a local resource loading scenario, the third-party application program has a higher requirement on the disk reading speed; in the animation rendering scene, the third-party application program has a high requirement on the performance of the GPU. The operating system and the third-party application program are independent from each other, and the operating system cannot sense the current application scene of the third-party application program in time, so that the operating system cannot perform targeted system resource adaptation according to the specific application scene of the third-party application program.
In order to enable the operating system to distinguish a specific application scenario of the third-party application program, data communication between the third-party application program and the operating system needs to be opened, so that the operating system can acquire current scenario information of the third-party application program at any time, and further perform targeted system resource adaptation based on the current scenario.
Taking an operating system as an Android system as an example, programs and data stored in the memory 120 are as shown in fig. 11, and a Linux kernel layer 320, a system runtime library layer 340, an application framework layer 360, and an application layer 380 may be stored in the memory 120, where the Linux kernel layer 320, the system runtime library layer 340, and the application framework layer 360 belong to an operating system space, and the application layer 380 belongs to a user space. The Linux kernel layer 320 provides underlying drivers for various hardware of the electronic device, such as a display driver, an audio driver, a camera driver, a bluetooth driver, a Wi-Fi driver, power management, and the like. The system runtime library layer 340 provides a main feature support for the Android system through some C/C + + libraries. For example, the SQLite library provides support for a database, the OpenGL/ES library provides support for 3D drawing, the Webkit library provides support for a browser kernel, and the like. Also provided in the system runtime library layer 340 is an Android runtime library (Android runtime), which mainly provides some core libraries that can allow developers to write Android applications using the Java language. The application framework layer 360 provides various APIs that may be used in building an application, and developers may build their own applications by using these APIs, such as activity management, window management, view management, notification management, content provider, package management, session management, resource management, and location management. At least one application program runs in the application layer 380, and the application programs may be native application programs carried by the operating system, such as a contact program, a short message program, a clock program, a camera application, and the like; or a third-party application developed by a third-party developer, such as a game application, an instant messaging program, a photo beautification program, a character string recognition program, and the like.
Taking an operating system as an IOS system as an example, programs and data stored in the memory 120 are shown in fig. 12, and the IOS system includes: a Core operating system Layer 420(Core OS Layer), a Core Services Layer 440(Core Services Layer), a media Layer 460(Medialayer), and a touchable Layer 480(Cocoa Touch Layer). The kernel operating system layer 420 includes an operating system kernel, drivers, and underlying program frameworks that provide functionality closer to hardware for use by program frameworks located in the core services layer 440. The core services layer 440 provides system services and/or program frameworks, such as a Foundation framework, an account framework, an advertisement framework, a data storage framework, a network connection framework, a geographic location framework, a motion framework, and so forth, as required by the application. The media layer 460 provides audiovisual related interfaces for applications, such as graphics image related interfaces, audio technology related interfaces, video technology related interfaces, audio video transmission technology wireless playback (AirPlay) interfaces, and the like. Touchable layer 480 provides various common interface-related frameworks for application development, and touchable layer 480 is responsible for user touch interaction operations on the electronic device. Such as a local notification service, a remote push service, an advertising framework, a game tool framework, a messaging User Interface (UI) framework, a user interface UIKit framework, a map framework, and so forth.
In the framework illustrated in FIG. 12, the framework associated with most applications includes, but is not limited to: a base framework in the core services layer 440 and a UIKit framework in the touchable layer 480. The base framework provides many basic object classes and data types, provides the most basic system services for all applications, and is UI independent. While the class provided by the UIKit framework is a basic library of UI classes for creating touch-based user interfaces, iOS applications can provide UIs based on the UIKit framework, so it provides an infrastructure for applications for building user interfaces, drawing, processing and user interaction events, responding to gestures, and the like.
The Android system can be referred to as a mode and a principle for realizing data communication between the third-party application program and the operating system in the IOS system, and details are not repeated herein.
The input device 130 is used for receiving input instructions or data, and the input device 130 includes, but is not limited to, a keyboard, a mouse, a camera, a microphone, or a touch device. The output device 140 is used for outputting instructions or data, and the output device 140 includes, but is not limited to, a display device, a speaker, and the like. In one example, the input device 130 and the output device 140 may be combined, and the input device 130 and the output device 140 are touch display screens for receiving touch operations of a user on or near the touch display screens by using any suitable object such as a finger, a touch pen, and the like, and displaying user interfaces of various applications. Touch displays are typically provided on the front panel of an electronic device. The touch display screen may be designed as a full-face screen, a curved screen, or a profiled screen. The touch display screen can also be designed to be a combination of a full-face screen and a curved-face screen, and a combination of a special-shaped screen and a curved-face screen, which is not limited in the embodiment of the present application.
In addition, those skilled in the art will appreciate that the configurations of the electronic devices illustrated in the above-described figures do not constitute limitations on the electronic devices, which may include more or fewer components than illustrated, or some components may be combined, or a different arrangement of components. For example, the electronic device further includes a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (WiFi) module, a power supply, a bluetooth module, and other components, which are not described herein again.
In the embodiment of the present application, the main body of execution of each step may be the electronic device described above. Optionally, the execution subject of each step is an operating system of the electronic device. The operating system may be an android system, an IOS system, or another operating system, which is not limited in this embodiment of the present application.
The electronic device of the embodiment of the application can also be provided with a display device, and the display device can be various devices capable of realizing a display function, for example: a cathode ray tube display (CR), a light-emitting diode display (LED), an electronic ink panel, a Liquid Crystal Display (LCD), a Plasma Display Panel (PDP), and the like. A user may utilize a display device on the electronic device 101 to view information such as displayed text, images, video, and the like. The electronic device may be a smartphone, a tablet computer, a gaming device, an AR (augmented reality) device, an automobile, a data storage device, an audio playback device, a video playback device, a notebook, a desktop computing device, a wearable device such as an electronic watch, an electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic garment, or the like.
In the electronic device shown in fig. 9, where the electronic device may be a terminal, the processor 110 may be configured to call the character string recognition application stored in the memory 120, and specifically perform the following operations:
acquiring a decompiled file of a detection object, and acquiring a first character string set corresponding to the decompiled file;
performing ciphertext character recognition on each character string in the first character string set to obtain a ciphertext character string set;
and carrying out hidden character recognition on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string.
In an embodiment, when the executing the detection object includes a detection application, and when the decompilated file of the detection object is obtained, the processor 110 specifically performs the following operations:
and reversely compiling the detection application to obtain a decompiled file of the detection application.
In an embodiment, when the obtaining of the first character string set corresponding to the decompiled file is executed, the processor 110 specifically executes the following operations:
determining a reference byte of a target type and a character offset value corresponding to the reference byte in the decompiled file;
and acquiring a first character string set containing at least one character string in the compiled file based on the reference byte and the character deviation value.
In an embodiment, when performing the ciphertext character recognition on each character string in the first character string set to obtain a ciphertext character string set, the processor 110 specifically performs the following operations:
determining character features corresponding to each character string in the first character string set;
and performing ciphertext character recognition based on the character features corresponding to each character string, and determining a ciphertext character string set.
In an embodiment, when performing the ciphertext character recognition based on the character features corresponding to each character string to determine the ciphertext character string set, the processor 110 specifically performs the following operations:
when the lower case character quantity of a first character string in each character string is larger than a first threshold value, and the target character quantity of the first character string is smaller than a second threshold value, determining that the first character string is a ciphertext character string in a ciphertext character string set, wherein the first threshold value and the second threshold value are associated with the total character quantity of each character string, and the target character quantity is the number corresponding to characters of each character string except the lower case character and the upper case character;
when the lower case character quantity of a second character string in each character string is smaller than the first threshold value and the reference character quantity of the second character string is smaller than a third threshold value, determining that the second character string is the ciphertext character string in the ciphertext character string set, wherein the reference character quantity is a difference value between the target character quantity and the upper case character quantity, and the third threshold value is associated with the total character quantity of each character string.
In one embodiment, the processor 110, when executing the string recognizer method, further performs the following:
and when the character string belongs to a common character string in a common language dictionary, determining that the character string is not the ciphertext character string.
In an embodiment, when performing hidden character recognition on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string, the processor 110 specifically performs the following operations:
extracting interface calling characteristics in the decompiled file, and determining the encryption mode of the detection object based on the interface calling characteristics;
and determining at least one hidden character string in each ciphertext character string of the ciphertext character string set according to a decryption mode corresponding to the encryption mode.
In one embodiment, after the obtaining of the at least one hidden character string, the processor 110 further performs the following steps:
determining a character application type of each of the hidden character strings.
In an embodiment, when the processor 110 determines the character application type of each hidden character string, the following steps are specifically performed:
performing second regular matching processing on each hidden character string, and determining a first character application type corresponding to the hidden character string; and/or the presence of a gas in the gas,
and performing common character matching processing on each hidden character string and a preset interface character dictionary, and determining a second character application type corresponding to the hidden character string.
In the embodiment of the application, a terminal acquires a decompiled file of a detection object, acquires a first character string set corresponding to the decompiled file, performs ciphertext character recognition on each character string in the first character string set to obtain a ciphertext character string set, and performs hidden character recognition on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string. The ciphertext character string set with hidden characters possibly is screened out by carrying out ciphertext recognition on the decompiled file of the detection object, and then the hidden characters are recognized on the ciphertext character string, so that the problem of low accuracy in character string recognition can be solved, the accuracy of character recognition is improved, all character strings contained in the decompiled file do not need to be subjected to ciphertext recognition one by one, only the screened ciphertext characters need to be subjected to hidden character recognition, and the efficiency in character recognition is improved; the whole character recognition of the detection object can be carried out without a manual audit mode, the hidden character recognition process can be carried out automatically, and the automation degree of character string recognition is improved; in the character recognition process, ciphertext recognition can be performed based on character features (such as the number of capital and small cases) of character strings in the decompiled file so as to preliminarily determine a ciphertext character string set with hidden characters possibly existing, so that the character recognition means is enriched, and the hidden character strings can be conveniently and rapidly recognized.
It is clear to a person skilled in the art that the solution of the present application can be implemented by means of software and/or hardware. The "unit" and "module" in this specification refer to software and/or hardware that can perform a specific function independently or in cooperation with other components, where the hardware may be, for example, a Field-ProgrammaBLE Gate Array (FPGA), an Integrated Circuit (IC), or the like.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some service interfaces, devices or units, and may be an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, and the memory may include: flash disks, Read-only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above description is only an exemplary embodiment of the present disclosure, and the scope of the present disclosure should not be limited thereby. That is, all equivalent changes and modifications made in accordance with the teachings of the present disclosure are intended to be included within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (12)

1. A method for character string recognition, the method comprising:
acquiring a decompiled file of a detection object, and acquiring a first character string set corresponding to the decompiled file;
performing ciphertext character recognition on each character string in the first character string set to obtain a ciphertext character string set;
and carrying out hidden character recognition on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string.
2. The method of claim 1, wherein the test object comprises a test application, and wherein obtaining a decompiled file of the test object comprises:
and reversely compiling the detection application to obtain a decompiled file of the detection application.
3. The method of claim 1, wherein obtaining the first set of strings corresponding to the decompiled file comprises:
determining a reference byte of a target type and a character offset value corresponding to the reference byte in the decompiled file;
and acquiring a first character string set containing at least one character string in the compiled file based on the reference byte and the character deviation value.
4. The method according to claim 1, wherein the performing ciphertext character recognition on each character string in the first character string set to obtain a ciphertext character string set comprises:
determining character features corresponding to each character string in the first character string set;
and performing ciphertext character recognition based on the character features corresponding to each character string, and determining a ciphertext character string set.
5. The method according to claim 4, wherein the performing ciphertext character recognition based on the character feature corresponding to each character string to determine a ciphertext character string set comprises:
when the lower case character quantity of a first character string in each character string is larger than a first threshold value, and the target character quantity of the first character string is smaller than a second threshold value, determining that the first character string is a ciphertext character string in a ciphertext character string set, wherein the first threshold value and the second threshold value are associated with the total character quantity of each character string, and the target character quantity is the number corresponding to characters of each character string except the lower case character and the upper case character;
when the lower case character quantity of a second character string in each character string is smaller than the first threshold value and the reference character quantity of the second character string is smaller than a third threshold value, determining that the second character string is the ciphertext character string in the ciphertext character string set, wherein the reference character quantity is a difference value between the target character quantity and the upper case character quantity, and the third threshold value is associated with the total character quantity of each character string.
6. The method of claim 5, further comprising:
and when the character string belongs to a common character string in a common language dictionary, determining that the character string is not the ciphertext character string.
7. The method according to claim 1, wherein the hidden character recognition for each ciphertext character string in the set of ciphertext character strings to obtain at least one hidden character string comprises:
extracting interface calling characteristics in the decompiled file, and determining the encryption mode of the detection object based on the interface calling characteristics;
and determining at least one hidden character string in each ciphertext character string of the ciphertext character string set according to a decryption mode corresponding to the encryption mode.
8. The method according to claim 7, wherein the determining at least one hidden string in each ciphertext string of the set of ciphertext strings according to the decryption manner corresponding to the encryption manner comprises:
and if the encryption mode is a fixed character-based target encryption mode, performing first regular matching processing on each ciphertext character string in the ciphertext character string set to obtain at least one hidden character string.
9. The method according to claim 1, wherein after obtaining the at least one hidden string, further comprising:
determining a character application type of each of the hidden character strings.
10. The method of claim 9, wherein determining the character application type for each hidden character string comprises:
performing second regular matching processing on each hidden character string, and determining a first character application type corresponding to the hidden character string; and/or the presence of a gas in the gas,
and performing common character matching processing on each hidden character string and a preset interface character dictionary, and determining a second character application type corresponding to the hidden character string.
11. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any of claims 1 to 10.
12. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1 to 10.
CN202011179927.8A 2020-10-29 2020-10-29 Character string recognition method and device, storage medium and electronic equipment Active CN112214653B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011179927.8A CN112214653B (en) 2020-10-29 2020-10-29 Character string recognition method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011179927.8A CN112214653B (en) 2020-10-29 2020-10-29 Character string recognition method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN112214653A true CN112214653A (en) 2021-01-12
CN112214653B CN112214653B (en) 2024-06-18

Family

ID=74057414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011179927.8A Active CN112214653B (en) 2020-10-29 2020-10-29 Character string recognition method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112214653B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113488022A (en) * 2021-07-07 2021-10-08 北京搜狗科技发展有限公司 Speech synthesis method and device
CN113852595A (en) * 2021-07-29 2021-12-28 四川天翼网络服务有限公司 Cross-network-segment encrypted communication method for embedded equipment
CN113885882A (en) * 2021-10-29 2022-01-04 四川效率源信息安全技术股份有限公司 Method for restoring iOS type character string
CN114168808A (en) * 2021-11-22 2022-03-11 中核核电运行管理有限公司 Regular expression-based document character string coding identification method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688803A (en) * 2016-08-05 2018-02-13 腾讯科技(深圳)有限公司 The method of calibration and device of recognition result in character recognition
CN108898008A (en) * 2018-04-27 2018-11-27 北京奇艺世纪科技有限公司 The operation method and device of application program
CN110059455A (en) * 2019-04-09 2019-07-26 北京迈格威科技有限公司 Code encryption method, apparatus, electronic equipment and computer readable storage medium
CN110417768A (en) * 2019-07-24 2019-11-05 北京神州绿盟信息安全科技股份有限公司 A kind of tracking and device of Botnet
CN110457872A (en) * 2019-07-19 2019-11-15 西安理工大学 A kind of hiding reinforcement means of Android App application resource

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688803A (en) * 2016-08-05 2018-02-13 腾讯科技(深圳)有限公司 The method of calibration and device of recognition result in character recognition
CN108898008A (en) * 2018-04-27 2018-11-27 北京奇艺世纪科技有限公司 The operation method and device of application program
CN110059455A (en) * 2019-04-09 2019-07-26 北京迈格威科技有限公司 Code encryption method, apparatus, electronic equipment and computer readable storage medium
CN110457872A (en) * 2019-07-19 2019-11-15 西安理工大学 A kind of hiding reinforcement means of Android App application resource
CN110417768A (en) * 2019-07-24 2019-11-05 北京神州绿盟信息安全科技股份有限公司 A kind of tracking and device of Botnet

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113488022A (en) * 2021-07-07 2021-10-08 北京搜狗科技发展有限公司 Speech synthesis method and device
CN113488022B (en) * 2021-07-07 2024-05-10 北京搜狗科技发展有限公司 Speech synthesis method and device
CN113852595A (en) * 2021-07-29 2021-12-28 四川天翼网络服务有限公司 Cross-network-segment encrypted communication method for embedded equipment
CN113852595B (en) * 2021-07-29 2024-02-02 四川天翼网络服务有限公司 Cross-network-segment encryption communication method for embedded equipment
CN113885882A (en) * 2021-10-29 2022-01-04 四川效率源信息安全技术股份有限公司 Method for restoring iOS type character string
CN113885882B (en) * 2021-10-29 2023-03-07 四川效率源信息安全技术股份有限公司 Method for restoring iOS type character string
CN114168808A (en) * 2021-11-22 2022-03-11 中核核电运行管理有限公司 Regular expression-based document character string coding identification method and device

Also Published As

Publication number Publication date
CN112214653B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
CN112214653B (en) Character string recognition method and device, storage medium and electronic equipment
CN107889070B (en) Picture processing method, device, terminal and computer readable storage medium
CN108595970B (en) Configuration method and device of processing assembly, terminal and storage medium
CN111327607B (en) Security threat information management method, system, storage medium and terminal based on big data
CN112653670A (en) Service logic vulnerability detection method, device, storage medium and terminal
CN111767554A (en) Screen sharing method and device, storage medium and electronic equipment
US11818491B2 (en) Image special effect configuration method, image recognition method, apparatus and electronic device
CN111596971B (en) Application cleaning method and device, storage medium and electronic equipment
CN117786726A (en) Source code file processing method and device, electronic equipment and storage medium
CN111752644A (en) Interface simulation method, device, equipment and storage medium
CN113126859A (en) Contextual model control method, contextual model control device, storage medium and terminal
CN114547604A (en) Application detection method and device, storage medium and electronic equipment
CN113158244A (en) Data privacy protection method and device, storage medium and electronic equipment
CN113221554A (en) Text processing method and device, electronic equipment and storage medium
CN111538997A (en) Image processing method, image processing device, storage medium and terminal
CN113836538A (en) Data model processing method, device, server and storage medium
CN113268221A (en) File matching method and device, storage medium and computer equipment
CN113098859A (en) Webpage page backspacing method, device, terminal and storage medium
CN112256354A (en) Application starting method and device, storage medium and electronic equipment
CN111274551A (en) Compiler-based java code protection method and device and electronic equipment
CN118070286A (en) Remote control Trojan detection method and device, storage medium and electronic equipment
CN118101613A (en) Message processing method and device, storage medium and electronic equipment
CN110532165B (en) Application program installation package characteristic detection method, device, equipment and storage medium
CN114240331A (en) Report generation method, report generation device, storage medium, and computer device
US20240104808A1 (en) Method and system for creating stickers from user-generated content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant