CN112507182A - Application screening method and device - Google Patents

Application screening method and device Download PDF

Info

Publication number
CN112507182A
CN112507182A CN202011495312.6A CN202011495312A CN112507182A CN 112507182 A CN112507182 A CN 112507182A CN 202011495312 A CN202011495312 A CN 202011495312A CN 112507182 A CN112507182 A CN 112507182A
Authority
CN
China
Prior art keywords
name information
screening
program
package name
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011495312.6A
Other languages
Chinese (zh)
Inventor
李亚卿
许文龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shangxiang Network Technology Co.,Ltd.
Original Assignee
Shanghai Lianshang Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lianshang Network Technology Co Ltd filed Critical Shanghai Lianshang Network Technology Co Ltd
Priority to CN202011495312.6A priority Critical patent/CN112507182A/en
Publication of CN112507182A publication Critical patent/CN112507182A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the application discloses a screening method and screening equipment of application programs. One embodiment of the method comprises: the method comprises the steps of respectively obtaining accurate package name information of a determined program and approximate package name information of a program to be screened, then respectively processing the accurate package name information and the approximate package name information by adopting different screening algorithms in a screening algorithm set to obtain a processing result set, wherein the screening algorithm set at least comprises a sequence comparison algorithm and a cosine similarity algorithm, the processing result set comprises processing results respectively corresponding to the selected different screening algorithms, the processing result set is input into a pre-trained screening matching model to be processed, and screening results corresponding to the program to be screened are generated. The implementation mode can be combined with a plurality of different screening algorithms to screen the program to be screened simultaneously so as to accurately determine whether the program to be screened and the program to be determined are the same program, and the problem of system misjudgment caused by packet name difference when the same program is processed is solved.

Description

Application screening method and device
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a screening method and screening equipment for application programs.
Background
In the present society, with the development of computer and internet technologies, in order to better provide services for users, service providers often develop a plurality of application programs for devices used by users, wherein a Package Name (Package Name) is used as an identifier of an application and plays a role in representing the application program in a software development process.
In the prior art, because the development specifications of the application program are not uniform, in order to better adapt to various devices which are frequently used by users, package names are often set according to the development specifications when the program is developed.
Disclosure of Invention
The embodiment of the application provides a screening method and screening equipment of an application program.
In a first aspect, an embodiment of the present application provides a method for screening an application, including: respectively acquiring accurate package name information of a determined program and approximate package name information of a program to be screened;
processing the accurate package name information and the approximate package name information by respectively adopting different screening algorithms in the screening algorithm set to obtain a processing result set; the screening algorithm set at least comprises a sequence comparison algorithm and a cosine similarity algorithm, and the processing result set comprises processing results corresponding to the selected different screening algorithms; and inputting the processing result set into a pre-trained screening matching model for processing to generate a screening result corresponding to the program to be screened.
In some embodiments, the step of processing the exact package name information and the approximate package name information using a sequence alignment algorithm comprises: generating a cost matrix between the accurate package name information and the approximate package name information; generating a minimum cost path of difference elements between the accurate package name information and the approximate package name information in a dynamic planning mode according to the cost matrix; calculating the total cost information of the difference elements between the accurate package name information and the package name information to be compared; and determining the total cost information as a processing result corresponding to the sequence alignment algorithm.
In some embodiments, determining the total cost information as the processing result corresponding to the sequence alignment algorithm comprises: acquiring the total cost information, and generating a first similarity evaluation result according to the numerical relationship between the total cost information and a first evaluation interval; and taking the first similarity evaluation result as a processing result corresponding to the sequence comparison algorithm.
In some embodiments, the step of processing the accurate packet name information and the approximate packet name information using a cosine similarity algorithm comprises: generating an accurate sequence vector of the accurate packet name information and an approximate sequence vector of the approximate packet name information; calculating cosine value information between the accurate sequence vector and the approximate sequence vector; and taking the cosine value information as a processing result corresponding to the cosine similarity algorithm.
In some embodiments, the processing of the cosine value information as a result of the cosine similarity algorithm includes: acquiring cosine value information, and generating a second similarity evaluation result according to the numerical relationship between the cosine value information and a second evaluation interval; and taking the second similarity evaluation result as a processing result corresponding to the cosine similarity algorithm.
In some embodiments, the screening method of the application further comprises: respectively carrying out word segmentation processing on the accurate package name information and the approximate package name information to obtain corresponding word segmentation sets; in response to the fact that the participles with the occurrence frequency exceeding a preset threshold exist in the participle set, correspondingly removing the participle information from the participle set; and processing the accurate package name information and the approximate package name information by respectively adopting different screening algorithms in the screening algorithm set, and obtaining a processing result set, wherein the processing result set comprises: and processing the word segmentation sets corresponding to the accurate package name information and the approximate package name information respectively by adopting different screening algorithms in the screening algorithm set to obtain a processing result set.
In some embodiments, inputting the processing result set to the screening matching model for processing, and generating the screening result corresponding to the program to be screened includes: inputting the processing result set into a preset screening matching model for processing to generate matching characteristic parameters corresponding to the processing result set; and in response to determining that the characteristic parameter meets a preset threshold requirement, determining the determining program and the program to be filtered to be the same program.
In some embodiments, the training step of screening matching models comprises: acquiring a plurality of different packet name information of the same program; processing the different packet name information by respectively adopting different screening algorithms in the screening algorithm set to obtain a processing result set; and training the original model by taking the processing result set as input and the information of the plurality of different packet names as output to obtain the matching screening model.
In a second aspect, an embodiment of the present application provides an apparatus for screening an application, including: a package name information acquisition unit configured to acquire accurate package name information of a determination program and approximate package name information of a program to be screened, respectively; the package name information processing unit is configured to process the accurate package name information and the approximate package name information by respectively adopting different screening algorithms in the screening algorithm set to obtain a processing result set; the screening algorithm set at least comprises a sequence comparison algorithm and a cosine similarity algorithm, and the processing result set comprises processing results corresponding to the selected different screening algorithms; and the screening result generating unit is configured to input the processing result set into a pre-trained screening matching model for processing, and generate a screening result corresponding to the program to be screened.
In some embodiments, the package name information processing unit includes: a sequence alignment algorithm subunit configured to generate a sequence alignment algorithm that processes the exact package name information and the approximate package name information using the sequence alignment algorithm, the sequence alignment algorithm subunit comprising: a cost matrix calculation module configured to generate a cost matrix between the accurate package name information and the approximate package name information;
a cost path calculation module configured to generate a minimum cost path of difference elements between the accurate package name information and the approximate package name information by a dynamic programming manner according to the cost matrix; a total cost calculation module configured to calculate total cost information of difference elements between the accurate package name information and the package name information to be compared; and the first processing result generation module is configured to determine the total cost information as a processing result corresponding to the sequence comparison algorithm.
In some embodiments, the first processing result generation module comprises: and the first similarity evaluation submodule is configured to acquire the total cost information, generate a first similarity evaluation result according to the numerical relationship between the total cost information and the first evaluation interval, and use the first similarity evaluation result as a processing result corresponding to the sequence comparison algorithm.
In some embodiments, the package name information processing unit includes: a cosine similarity algorithm subunit configured to process the exact packet name information and the approximate packet name information using a cosine similarity algorithm, the cosine similarity algorithm subunit comprising:
a sequence vector generation module configured to generate an exact sequence vector of the exact package name information and an approximate sequence vector of the approximate package name information; a cosine information calculation module configured to calculate cosine value information between the exact sequence vector and the approximate sequence vector; and the second processing result generation module is configured to take the cosine value information as a processing result corresponding to the cosine similarity algorithm.
In some embodiments, the second processing result generation module comprises: the second similarity evaluation submodule is configured to acquire the cosine value information and generate a second similarity evaluation result according to the numerical relationship between the cosine value information and a second evaluation interval; and taking the second similarity evaluation result as a processing result corresponding to the cosine similarity algorithm.
In some embodiments, the screening apparatus of the application further includes: the package name word segmentation unit is configured to perform word segmentation processing on the accurate package name information and the approximate package name information respectively to obtain corresponding word segmentation sets;
the weight screening unit is configured to correspondingly remove the word segmentation information from the word segmentation set in response to determining that the word segmentation with the occurrence frequency exceeding a preset threshold exists in the word segmentation set; and the package name information processing unit is further configured to process the word segmentation sets corresponding to the accurate package name information and the approximate package name information respectively by adopting different screening algorithms in the screening algorithm set to obtain a processing result set.
In some embodiments, the screening result generating unit includes: the characteristic parameter generating subunit is configured to input the processing result set to a preset screening matching model for processing, and generate matching characteristic parameters corresponding to the processing result set; and the same program determining subunit is configured to determine the determining program and the program to be filtered as the same program in response to determining that the characteristic parameter meets the preset threshold requirement.
In some embodiments, the screening apparatus of the application further includes: a screening matching model training unit configured to train the original model to obtain the screening matching model, wherein the screening matching model training unit comprises: the system comprises a sample acquisition subunit, a data processing unit and a data processing unit, wherein the sample acquisition subunit is configured to acquire a plurality of different packet name information of the same program, and process the different packet name information respectively by adopting different screening algorithms in a screening algorithm set to obtain a processing result set sample set; and the matching screening model generating subunit is configured to train the original model by taking the processing result set sample set as input and the plurality of different packet name information points to the same program as output so as to obtain the matching screening model.
In a third aspect, an embodiment of the present application provides a computer device, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
The screening method and the screening device for the application program, provided by the embodiment of the application program, respectively acquire accurate package name information of a determined program and approximate package name information of a program to be screened, and then respectively process the accurate package name information and the approximate package name information by adopting different screening algorithms in a screening algorithm set to obtain a processing result set, wherein the screening algorithm set at least comprises a sequence comparison algorithm and a cosine similarity algorithm, the processing result set comprises processing results respectively corresponding to the selected different screening algorithms, the processing result set is input to a pre-trained screening matching model for processing, and screening results corresponding to the program to be screened are generated. The implementation mode can be combined with a plurality of different screening algorithms to screen the program to be screened simultaneously so as to accurately determine whether the program to be screened and the program to be determined are the same program, and the problem of system misjudgment caused by packet name difference when the same program is processed is solved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture to which some embodiments of the present application may be applied;
FIG. 2 is a flow chart of a first embodiment of a screening method of an application according to the present application;
FIG. 3 is a flow chart of the processing steps for a screening algorithm as a sequence alignment algorithm in one implementation of a screening method according to the application of the present application;
FIG. 4 is a flow chart of processing steps for a screening algorithm that is a cosine similarity algorithm in one implementation of a screening method according to the application of the present application;
FIG. 5 is a flow chart of a second embodiment of a screening method of an application according to the present application;
FIG. 6 is a schematic block diagram of a computer system suitable for use with the computer device of some embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of a screening method, apparatus, electronic device, and computer-readable storage medium of an application of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various applications for realizing data communication between the terminal devices 101, 102, 103 and the server 105, such as a plug-in screening application, a remote debugging application, a data installation application, etc., may be installed on the terminal devices 101, 102, 103 and the server 105.
The terminal apparatuses 101, 102, 103 and the server 105 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like; when the terminal devices 101, 102, and 103 are software, they may be installed in the electronic devices listed above, and they may be implemented as multiple software or software modules, or may be implemented as a single software or software module, and are not limited in this respect. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server; when the server is software, the server may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited herein.
The server 105 may provide various services through various built-in applications, for example, a plug-in screening class application that may provide application screening, and the server 105 may implement the following effects when running the plug-in screening class application: firstly, acquiring accurate package name information of a determined program and approximate package name information of a program to be screened from terminal equipment 101, 102 and 103 through a network 104 respectively; then, the server 105 processes the accurate packet name information and the approximate packet name information by respectively adopting different screening algorithms in the screening algorithm set to obtain a processing result set; the screening algorithm set at least comprises a sequence comparison algorithm and a cosine similarity algorithm, and the processing result set comprises processing results corresponding to the selected different screening algorithms; finally, the server 105 inputs the processing result set to a pre-trained screening matching model for processing, and generates a screening result corresponding to the program to be screened.
It should be noted that, the accurate package name information of the determined program and the approximate package name information of the program to be filtered are generally obtained from different terminal devices to realize the filtering of the same program under different development specifications and development environments, and of course, the accurate package name information of the determined program and the approximate package name information of the program to be filtered may also be obtained from the same device supporting different development specifications and development environments.
The accurate package name information of the determination program and the approximate package name information of the program to be filtered may be acquired from the terminal apparatuses 101, 102, 103 through the network 104, or may be stored locally in the server 105 in advance in various ways. Thus, when the server 105 detects that such data is already stored locally (e.g., locally generated accurate package name information for the determination program as a backup and approximate package name information for the program to be filtered), the exemplary system architecture 100 may choose to obtain such data directly from locally, in which case the exemplary system architecture 100 may also not include the terminal devices 101, 102, 103 and the network 104.
Since the screening based on the application usually needs to occupy more computing resources, stronger computing capability and more comprehensive application support capability, the screening method for the application provided in the following embodiments of the present application is generally executed by the server 105 having stronger computing capability, more computing resources and stronger application support capability, and accordingly, the screening device for the application is generally also disposed in the server 105. However, it should be noted that when the terminal devices 101, 102, and 103 also have computing capabilities and computing resources meeting the requirements, the terminal devices 101, 102, and 103 may also complete the above-mentioned operations performed by the server 105 through the plug-in screening application installed thereon, and then output the same result as the server 105. Particularly, when there are a plurality of terminal devices having different computing capabilities at the same time, but the plug-in filtering application determines that the terminal device has a strong computing capability and a large amount of computing resources are left, the terminal device may execute the above-mentioned computation, so as to appropriately reduce the computing pressure of the server 105, and accordingly, the filtering device for the application program may be installed in the terminal devices 101, 102, and 103. In such a case, the exemplary system architecture 100 may also not include the server 105 and the network 104.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of a first embodiment of a screening method of an application according to the present application is shown. The screening method of the application program can comprise the following steps:
step 201, obtaining the accurate package name information of the determined program and the approximate package name information of the program to be screened respectively.
In this embodiment, an executing subject (for example, the server 105 shown in fig. 1) of the screening method of the application program may obtain the accurate package name information of the determination program and the approximate package name information of the program to be screened from a local or non-local storage device (for example, the terminal devices 101, 102, 103 shown in fig. 1), and obtain the presentation file selected by the user from the local or other non-local storage device, where the accurate package name information of the determination program and the approximate package name information of the program to be screened may be stored in other devices in a distributed manner for all possible special requirements of storage in an actual application scenario, and the storage device may be an original file or a backup file, which is not specifically limited herein.
It should be understood that the exact package name information of the determined program is usually the verified, known and determined package name information of the target program, i.e. the exact package name information, and the execution subject may perform screening according to the exact package name information and the approximate package name information of the program to be screened to determine whether the program to be screened and the determined program are the same program.
In practice, the approximate packet name information of the program to be screened usually has a part of and a large amount of overlap with the accurate packet name information, so that the program to be screened can be preliminarily screened based on the principle to avoid simultaneously importing a large amount of programs to be screened with low data accuracy to influence the screening efficiency, for example, if the packet name information of the program is determined to be "com.android.phone", the program with the packet name information of "com.android.phone.digital contacts EntryActivity" can be determined to be the program to be screened, and the similarity of the accurate packet name information of the program with the packet name information of "com.motorola.blue.switching" is low, so that the program to be screened can be removed in advance.
Step 202, processing the accurate package name information and the approximate package name information by respectively adopting different screening algorithms in the screening algorithm set to obtain a processing result set.
In this embodiment, different screening algorithms in the screening algorithm set are respectively selected to respectively process the accurate packet name information and the approximate packet name information to obtain processing results corresponding to the selected screening algorithms, and then the corresponding results obtained by the different screening algorithms are summarized to obtain a processing result set.
The screening algorithm set usually includes different algorithms for comparing whether the packet name information is the same, wherein the algorithms at least include a sequence comparison algorithm (Needleman-wensch) and a cosine similarity algorithm, and may further include a semantic similarity algorithm, a source code file comparison and other screening algorithms, so as to achieve the purpose of determining whether the program and the program to be screened are the same program according to the accurate packet name information and the approximate packet name information.
In some embodiments, the step of processing the exact package name information and the approximate package name information by using the sequence alignment algorithm may be referred to as a flow 300 shown in fig. 3, which specifically includes:
step 301, a cost matrix between the accurate package name information and the approximate package name information is generated.
Specifically, a cost matrix is constructed by using the package name information contained in the accurate package name information and the approximate package name information, and in the matrix, each element value represents the cost for modifying (adding, deleting, inserting) the element to the target element, namely the cost for modifying the accurate package name information to the approximate package name information.
And 302, generating a minimum cost path of difference elements between the accurate package name information and the approximate package name information in a dynamic planning mode according to the cost matrix.
In the present application, after a current starting position of a matrix (i.e., accurate package name information) is input, a minimum path to corresponding approximate package name information in the matrix is calculated based on a Dynamic programming method.
Step 303, calculating the total cost information of the difference elements between the accurate package name information and the package name information to be compared.
Specifically, the total cost information required for changing from the accurate packet name information to the approximate packet name information is determined according to the minimum path determined in the step 302, and the lower the value of the total cost information is, the closer the accurate packet name information and the approximate packet name information are.
And step 304, determining the total cost information as a processing result corresponding to the sequence comparison algorithm.
Specifically, the total cost information calculated in step 303 is obtained, and the total cost information is output as a processing result corresponding to the sequence alignment algorithm.
Through the steps, when the accurate package name information and the approximate package name information are compared by using the sequence comparison algorithm, the comparison can be directly carried out based on the complete accurate package name information and the approximate package name information, and semantic identification processing is not required to be carried out on the package name information and the approximate package name information.
Further, after generating the cost information, in order to more intuitively embody the comparison result between the determination program and the program to be screened, in some embodiments, determining the total cost information as the processing result corresponding to the sequence comparison algorithm includes: acquiring the total cost information, and generating a first similarity evaluation result according to the numerical relationship between the total cost information and a first evaluation interval; and taking the first similarity evaluation result as a processing result corresponding to the sequence comparison algorithm.
Specifically, after the total cost information is obtained, a first similarity evaluation result may be generated according to a numerical relationship between the total cost information and a preset first evaluation interval to evaluate the similarity between the determination program and the program to be screened, for example, the first similarity evaluation interval may be set to [ a, b ], if the total cost information is lower than a, the determination program and the program to be screened may be considered different, if the total cost information is located between a and b, that is, falls within the range of [ a, b ], the determination program and the program to be screened may be considered similar, if the total cost information is greater than b, the determination program and the program to be screened may be considered completely identical, and after the total cost information is obtained, the total cost information may be further evaluated according to the total cost information to obtain more useful information through the above implementation.
In practice, when the first similarity evaluation interval is set, a plurality of sections can be correspondingly set so as to further subdivide the evaluation of the total cost information, so as to obtain a more accurate evaluation result.
In some embodiments, the step of processing the accurate packet name information and the approximate packet name information by using the cosine similarity algorithm may refer to a flow 400 shown in fig. 4, which specifically includes:
step 401, generating an accurate sequence vector of the accurate packet name information and an approximate sequence vector of the approximate packet name information.
Specifically, after the vector dimension is determined according to the lengths of the accurate packet name information and the approximate packet name information, an accurate sequence vector and an approximate sequence vector of the accurate packet name information are generated according to the contents of the accurate packet name information and the approximate packet name information respectively.
Step 402, calculating cosine value information between the exact sequence vector and the approximate sequence vector.
Specifically, referring to a method in geometry, a cosine value of an included angle between a definite sequence vector and the approximate sequence vector is calculated, and corresponding cosine value information is obtained.
Step 403, using the cosine value information as the processing result corresponding to the cosine similarity algorithm.
Specifically, cosine value information calculated in step 402 is obtained, and the cosine value information is output as a processing result corresponding to the cosine similarity algorithm.
Through the steps, when the cosine similarity algorithm is used for comparing the accurate packet name information with the approximate packet name information, the determination program and the program to be screened are simply, conveniently and quickly compared with higher precision in a vector comparison mode, and the screening efficiency is improved while the screening precision is higher.
Further, in order to also realize a more intuitive embodiment of the comparison result between the program determined based on the cosine similarity algorithm and the program to be filtered, in some embodiments, the processing of using the cosine value information as the corresponding result of the cosine similarity algorithm includes: acquiring cosine value information, and generating a second similarity evaluation result according to the numerical relationship between the cosine value information and a second evaluation interval; and taking the second similarity evaluation result as a processing result corresponding to the cosine similarity algorithm.
Specifically, similar to the implementation manner of the above-mentioned related total cost information, a second evaluation interval may be set for the cosine value information in the same manner, so as to achieve the purpose of evaluating the determination program and the program to be filtered according to the obtained cosine value information.
And 203, inputting the processing result set into a pre-trained screening matching model for processing, and generating a screening result corresponding to the program to be screened.
In this embodiment, the processing result set obtained in step 202 is obtained, and the pre-trained screening matching model is used to process the processing result set, so as to comprehensively determine whether the determination program and the program to be screened are the same program according to the processing results corresponding to different screening algorithms included in the processing combination set.
In some embodiments, in order to better perform accurate analysis on a processing result set including processing results corresponding to a plurality of different screening algorithms, the training step of screening the matching model includes: acquiring a plurality of different packet name information of the same program, and processing the different packet name information by respectively adopting different screening algorithms in a screening algorithm set to obtain a processing result set sample set; and training the original model by taking the processing result set sample set as input and the information of the plurality of different packet names pointing to the same program as output so as to obtain the matching screening model.
Specifically, the screening matching model can be trained through a predetermined sample set, analysis conditions of processing results corresponding to different screening algorithms are recorded in the sample set, when the training sample set is determined, a plurality of different packet name information of the same program is obtained, a plurality of processing results corresponding to the screening algorithms are generated by adopting the screening algorithms which are different in the screening algorithm set respectively, so that the matching screening model obtained after the original model is trained by using the training sample set can accurately evaluate the different screening algorithms, the reliability of evaluation is improved, and the problem that the matching screening model cannot be adapted when processing results are obtained by using the screening algorithms which are not adopted is solved.
On this basis, when the screening matching model is trained, the weight conditions of processing results corresponding to different screening algorithms can be set, and when only the sequence comparison algorithm and the cosine similarity algorithm exist exemplarily, the reference weight of the sequence comparison algorithm can be set to be 60%, and the reference weight of the pre-similarity can be set to be 40%, so that the dynamic adjustment of the screening method of the application program can be realized according to factors such as specific requirements, the trust degree of the algorithm and the like.
The application program screening method provided by the embodiment of the application program obtains accurate package name information of a determined program and approximate package name information of a program to be screened respectively, then processes the accurate package name information and the approximate package name information by adopting different screening algorithms in a screening algorithm set respectively to obtain a processing result set, wherein the screening algorithm set at least comprises a sequence comparison algorithm and a cosine similarity algorithm, the processing result set comprises processing results corresponding to the selected different screening algorithms respectively, the processing result set is input into a pre-trained screening matching model to be processed, and screening results corresponding to the program to be screened are generated. The implementation mode can be combined with a plurality of different screening algorithms to screen the program to be screened simultaneously so as to accurately determine whether the program to be screened and the program to be determined are the same program, and the problem of system misjudgment caused by packet name difference when the same program is processed is solved.
With continued reference to FIG. 5, a flow 500 of a second embodiment of a screening method of an application according to the present application is shown. The screening method of the application program can comprise the following steps:
step 501, obtaining accurate package name information of a determined program and approximate package name information of a program to be screened respectively.
Step 502, performing word segmentation processing on the accurate package name information and the approximate package name information respectively to obtain corresponding word segmentation sets.
In the embodiment, word segmentation processing is performed on the accurate package name information and the approximate package name information to obtain a word segmentation set corresponding to the accurate package name information and a word segmentation set corresponding to the approximate package name information.
Step 503, in response to determining that there is a participle whose occurrence frequency exceeds a preset threshold in the participle set, correspondingly removing the participle information from the participle set.
In this embodiment, the results obtained by the word segmentation processing in step 502, that is, the segmentation sets corresponding to the accurate package name information and the segmentation sets corresponding to the approximate package name information, are obtained, the segmentation included in the two segmentation sets is respectively filtered, and the segmentation with the occurrence frequency exceeding the preset threshold in each segmentation set is removed.
In some embodiments, in order to improve the determination of whether the occurrence frequency of a certain word in the word segmentation set exceeds a preset threshold, the screening of the word segmentation in the word segmentation set may also be implemented by calculating an inverse document frequency (IDF inverse document frequency) of the word segmentation, which is used to evaluate the importance degree of the word to one of the file sets or one of the files in one corpus. The importance of the word is increased in proportion with the occurrence frequency of the word in the file, but is reduced in inverse proportion with the occurrence frequency of the word in the corpus, and after the inverse document frequency of each word in the word segmentation set is obtained, the corresponding inverse document frequency threshold is determined according to the preset threshold of the occurrence frequency, so that the word segmentation with the inverse document frequency not meeting the requirement is removed.
It should be understood that, in addition to calculating the inverse document frequency of the word segmentation result, a method such as a ranking method based on joint analysis may be used to realize the weight representation of the result of word segmentation processing.
Step 504, respectively adopting different screening algorithms in the screening algorithm set to process the participle sets corresponding to the accurate package name information and the approximate package name information respectively, and obtaining a processing result set.
And 505, inputting the processing result set into a pre-trained screening matching model for processing, and generating a screening result corresponding to the program to be screened.
In the screening method for application programs provided in this embodiment, steps 501, 504, and 505 are similar to steps 201 to 203 in the embodiment shown in fig. 2, and the same contents are not repeated. By the application screening method described in this embodiment, on the basis of the embodiment shown in fig. 2, after the accurate package name information and the approximate package name information are obtained, the frequently-occurring and low-reference-value participles in the accurate package name information and the approximate package name information can be removed, so that the screening efficiency of the application is improved.
On the basis of any of the embodiments, in order to enable a user who uses the execution main body to perform the screening method operation of the application program to more intuitively understand the comparison process, the processing result set is input into the screening matching model for processing, and the generating of the screening result corresponding to the program to be screened includes: inputting the processing result set into a preset screening matching model for processing to generate matching characteristic parameters corresponding to the processing result set; and in response to determining that the characteristic parameter meets a preset threshold requirement, determining the determining program and the program to be filtered to be the same program.
Specifically, after the processing result set is input to the screening matching model for processing, the matching feature parameters corresponding to the processing result set can be obtained, and the matching feature parameters can exemplarily include a total value and evaluation values of the processing results corresponding to the screening algorithms, so that a user can know the composition of each part in the value under the condition of knowing the evaluation total value, so as to provide more reference information for the user, and provide data support for the user to make diversified evaluations according to the user's own needs and experiences.
For ease of understanding, an application scenario for a screening method for an application is provided below. In the application scenario, the accurate package name information of the program is determined to be "com. Approximate packet name information of the program to be filtered is "com.
And processing the accurate packet name information and the approximate packet name information by respectively adopting a sequence comparison algorithm and a cosine similarity algorithm in the screening algorithm set, and respectively obtaining a processing result set which is 'sequence comparison approximation and cosine similarity are the same' according to the numerical relationship between the total cost information obtained in the sequence comparison algorithm and the first evaluation interval and the numerical relationship between the cosine value information obtained in the cosine similarity algorithm and the second evaluation interval.
The processing result set is input into a pre-screening matching model, the obtained output result is an evaluation score 90 (sequence contrast approximately accounts for 40 minutes, cosine similarity approximately accounts for 50 minutes), and the program to be determined and the program to be screened are determined to be the same program when the evaluation score exceeds a preset threshold value 60.
Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use in implementing computer devices (e.g., terminal devices 101, 102, 103, server 105 shown in FIG. 1) of embodiments of the present application is shown. The computer device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output section 605 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or electronic device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a package name information acquisition unit, a package name information processing unit, and a filtering result generation unit. Here, the names of these units do not constitute a limitation on the units themselves in this case, and for example, the package name information acquisition unit may also be described as "acquiring accurate package name information of the determination program and approximate package name information of the program to be filtered, respectively".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the computer device described in the above embodiments; or may exist separately and not be incorporated into the computer device. The computer readable medium carries one or more programs which, when executed by the computing device, cause the computing device to: the method comprises the steps of respectively obtaining accurate package name information of a determined program and approximate package name information of a program to be screened, then respectively processing the accurate package name information and the approximate package name information by adopting different screening algorithms in a screening algorithm set to obtain a processing result set, wherein the screening algorithm set at least comprises a sequence comparison algorithm and a cosine similarity algorithm, the processing result set comprises processing results respectively corresponding to the selected different screening algorithms, the processing result set is input into a pre-trained screening matching model to be processed, and screening results corresponding to the program to be screened are generated.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (10)

1. A method of screening applications, comprising:
respectively acquiring accurate package name information of a determined program and approximate package name information of a program to be screened;
processing the accurate package name information and the approximate package name information by respectively adopting different screening algorithms in a screening algorithm set to obtain a processing result set; the screening algorithm set at least comprises a sequence comparison algorithm and a cosine similarity algorithm, and the processing result set comprises processing results corresponding to the selected different screening algorithms;
and inputting the processing result set into a pre-trained screening matching model for processing to generate a screening result corresponding to the program to be screened.
2. The method of claim 1, wherein processing the accurate package name information and the approximate package name information using the sequence alignment algorithm comprises:
generating a cost matrix between the accurate package name information and the approximate package name information;
generating a minimum cost path of difference elements between the accurate package name information and the approximate package name information in a dynamic planning mode according to the cost matrix;
calculating the total cost information of the difference elements between the accurate package name information and the package name information to be compared;
and determining the total cost information as a processing result corresponding to the sequence alignment algorithm.
3. The method of claim 2, wherein the determining the total cost information as the processing result corresponding to the sequence alignment algorithm comprises:
acquiring the total cost information, and generating a first similarity evaluation result according to the numerical relationship between the total cost information and a first evaluation interval;
and taking the first similarity evaluation result as a processing result corresponding to the sequence alignment algorithm.
4. The method of claim 1, wherein processing the accurate packet name information and the approximate packet name information using the cosine similarity algorithm comprises:
generating an accurate sequence vector of the accurate packet name information and an approximate sequence vector of the approximate packet name information;
calculating cosine value information between the exact sequence vector and the approximate sequence vector;
and taking the cosine value information as a processing result corresponding to the cosine similarity algorithm.
5. The method according to claim 4, wherein the using the cosine value information as a processing result corresponding to the cosine similarity algorithm comprises:
acquiring the cosine value information, and generating a second similarity evaluation result according to the numerical relationship between the cosine value information and a second evaluation interval;
and taking the second similarity evaluation result as a processing result corresponding to the cosine similarity algorithm.
6. The method of claim 1, further comprising:
respectively carrying out word segmentation processing on the accurate package name information and the approximate package name information to obtain corresponding word segmentation sets;
in response to the fact that the participles with the occurrence frequency exceeding a preset threshold exist in the participle set, correspondingly removing the participle information from the participle set;
and processing the accurate package name information and the approximate package name information by respectively adopting different screening algorithms in a screening algorithm set to obtain a processing result set, wherein the processing result set comprises:
and processing the word segmentation sets corresponding to the accurate package name information and the approximate package name information respectively by adopting different screening algorithms in a screening algorithm set to obtain a processing result set.
7. The method of claim 1, wherein the inputting the processing result set into a screening matching model for processing, and the generating of the screening result corresponding to the program to be screened includes:
inputting the processing result set into a preset screening matching model for processing to generate matching characteristic parameters corresponding to the processing result set;
and in response to determining that the characteristic parameter meets a preset threshold requirement, determining the determining program and the program to be screened as the same program.
8. The method of claim 1, wherein the training step of screening matching models comprises:
acquiring a plurality of different packet name information of the same program, and processing the different packet name information by respectively adopting different screening algorithms in a screening algorithm set to obtain a processing result set sample set;
and training an original model by taking the processing result set sample set as input and the information of the plurality of different packet names pointing to the same program as output so as to obtain the matching screening model.
9. A computer device, comprising:
one or more processors;
a storage device on which one or more programs are stored;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN202011495312.6A 2020-12-17 2020-12-17 Application screening method and device Pending CN112507182A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011495312.6A CN112507182A (en) 2020-12-17 2020-12-17 Application screening method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011495312.6A CN112507182A (en) 2020-12-17 2020-12-17 Application screening method and device

Publications (1)

Publication Number Publication Date
CN112507182A true CN112507182A (en) 2021-03-16

Family

ID=74922119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011495312.6A Pending CN112507182A (en) 2020-12-17 2020-12-17 Application screening method and device

Country Status (1)

Country Link
CN (1) CN112507182A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229131A (en) * 2016-12-14 2018-06-29 中国移动通信集团设计院有限公司 Counterfeit APP recognition methods and device
CN111553140A (en) * 2020-05-13 2020-08-18 金蝶软件(中国)有限公司 Data processing method, data processing apparatus, and computer storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229131A (en) * 2016-12-14 2018-06-29 中国移动通信集团设计院有限公司 Counterfeit APP recognition methods and device
CN111553140A (en) * 2020-05-13 2020-08-18 金蝶软件(中国)有限公司 Data processing method, data processing apparatus, and computer storage medium

Similar Documents

Publication Publication Date Title
CN111061956B (en) Method and apparatus for generating information
CN109359194B (en) Method and apparatus for predicting information categories
CN112684968A (en) Page display method and device, electronic equipment and computer readable medium
CN111461967B (en) Picture processing method, device, equipment and computer readable medium
CN111459364A (en) Icon updating method and device and electronic equipment
CN110245684B (en) Data processing method, electronic device, and medium
CN109885564B (en) Method and apparatus for transmitting information
CN109902726B (en) Resume information processing method and device
CN111581431B (en) Data exploration method and device based on dynamic evaluation
CN113535577A (en) Application testing method and device based on knowledge graph, electronic equipment and medium
CN111046393B (en) Vulnerability information uploading method and device, terminal equipment and storage medium
CN112685799A (en) Device fingerprint generation method and device, electronic device and computer readable medium
CN109542743B (en) Log checking method and device, electronic equipment and computer readable storage medium
CN110020166B (en) Data analysis method and related equipment
CN112507182A (en) Application screening method and device
CN112379967B (en) Simulator detection method, device, equipment and medium
CN114338846A (en) Message testing method and device
CN112084114A (en) Method and apparatus for testing an interface
CN112579428A (en) Interface testing method and device, electronic equipment and storage medium
CN113626301A (en) Method and device for generating test script
CN114428823B (en) Data linkage method, device, equipment and medium based on multidimensional variable expression
CN111460273B (en) Information pushing method and device
CN111125501A (en) Method and apparatus for processing information
CN115328811B (en) Program statement testing method and device for industrial control network simulation and electronic equipment
CN111104626B (en) Information storage method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211217

Address after: 200131 Zone E, 9th floor, No.1 Lane 666, zhangheng Road, Pudong New Area pilot Free Trade Zone, Shanghai

Applicant after: Shanghai Shangxiang Network Technology Co.,Ltd.

Address before: 201306 N2025 room 24, 2 New Town Road, mud town, Pudong New Area, Shanghai

Applicant before: SHANGHAI LIANSHANG NETWORK TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right