JP7291919B1

JP7291919B1 - Computer program reliability determination system, computer program reliability determination method, and computer program reliability determination program

Info

Publication number: JP7291919B1
Application number: JP2021214691A
Authority: JP
Inventors: 望宮川
Original assignee: 株式会社Ｆｆｒｉセキュリティ
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2023-06-16
Anticipated expiration: 2041-12-28
Also published as: JP2023098131A

Abstract

【課題】コンピュータプログラムの信頼性を判定するコンピュータプログラム信頼性判定システム、コンピュータプログラム信頼性判定方法及びコンピュータプログラム信頼性判定プログラムを提供する。【解決手段】コンピュータプログラム信頼性判定システムは、コンピュータプログラムを取得する取得部と、取得された前記コンピュータプログラムから複数の識別子を抽出する抽出部と、抽出された複数の識別子に基づいて少なくとも一つの難読化率を生成する難読化率生成部と、生成した難読化率に基づいてコンピュータプログラムの信頼性を判定する信頼性判定部と、を含む。【選択図】図１A computer program reliability determination system, a computer program reliability determination method, and a computer program reliability determination program for determining the reliability of a computer program are provided. A computer program reliability determination system includes an acquisition unit that acquires a computer program, an extraction unit that extracts a plurality of identifiers from the acquired computer program, and at least one identifier based on the extracted plurality of identifiers. An obfuscation rate generator that generates an obfuscation rate, and a reliability determination section that determines reliability of the computer program based on the generated obfuscation rate. [Selection drawing] Fig. 1

Description

本発明は、コンピュータプログラム信頼性判定システム、コンピュータプログラム信頼性判定方法およびコンピュータプログラム信頼性判定プログラムに関する。 The present invention relates to a computer program reliability determination system, a computer program reliability determination method, and a computer program reliability determination program.

不正かつ有害な動作を行う意図で作成された悪意のあるソフトウェアや悪質なコードは、総称してマルウェア（ｍａｌｉｃｉｏｕｓｓｏｆｔｗａｒｅ）と呼ばれている。マルウェアには、Ｗｏｒｄ文書やＥｘｃｅｌブックなどのドキュメントファイルに付属するスクリプト言語として実装される検体が存在する。ここで、コンピュータプログラムにおいて変数・関数・クラス等を識別するために使用される、変数名・関数名・クラス名・プロシージャ名・マクロ名等は、「識別子」と呼ばれている。また、コンピュータプログラムの作成元となったソースコードに含まれる文字列や、コンピュータプログラムに関連付けられたデバッグ情報に含まれる文字列も「識別子」に該当する。スクリプト言語を使用するマルウェアの一部では、スクリプト言語で記述されたコンピュータプログラム中で使用する識別子として、自然言語として解釈不可能の文字列が使用されている。これは、難読化の手法のひとつである。このような識別子は、ソースコードを実行した際に得られる動作を維持したまま、ソースコードの可読性を低下させることができる。このため、識別子は、コンピュータプログラムの目的を隠蔽するため使用される場合がある。 Malicious software or malicious code that is created with the intent to perform unauthorized and harmful actions is collectively referred to as malware. Malware includes specimens implemented as a script language attached to document files such as Word documents and Excel books. Here, variable names, function names, class names, procedure names, macro names, etc. used to identify variables, functions, classes, etc. in computer programs are called "identifiers". The "identifier" also includes a character string included in the source code from which the computer program was created and a character string included in debug information associated with the computer program. Some malware that uses scripting languages uses character strings that cannot be interpreted as natural language as identifiers used in computer programs written in scripting languages. This is one of the obfuscation techniques. Such identifiers can reduce the readability of the source code while preserving the behavior obtained when the source code is executed. For this reason, identifiers are sometimes used to hide the purpose of computer programs.

通常、ソースコード中の識別子は、コンパイルの段階で消失し、出力された実行可能ファイルには含まれない。そのため、コンパイル方式を採用している言語で開発された実行可能ファイル全般について、その元となったソースコードに含まれる識別子にかかる特徴を利用して何らかの判定を行うことはできない。一方、インタプリタ方式を採用しているスクリプト言語で開発されたプログラムについては、ソースコードと実行ファイルが一体化しているため、ソースコードに含まれる識別子にかかる特徴を利用して何らかの判定を行うことができる。 Identifiers in source code are usually lost at the compilation stage and are not included in the output executable file. Therefore, it is not possible to make any kind of determination using the characteristics of the identifiers included in the original source code for all executable files developed in a language adopting the compilation method. On the other hand, for a program developed in a script language that adopts the interpreter method, since the source code and the executable file are integrated, it is possible to make some kind of determination using the characteristics of the identifiers included in the source code. can.

ここで、既知のマルウェア対策の手法のひとつとして、パターンマッチングがある（非特許文献１～３参照）。パターンマッチングは、最も典型的には、既知のマルウェアから抽出された特定のバイト列が対象のコンピュータプログラムに含まれるか否かに基づき、コンピュータプログラムの信頼性を判定する手法である。パターンマッチングには、マルウェアの信頼性を誤って高いと判定する可能性が、他の手法と比較して低いという利点がある。 Here, pattern matching is one of known anti-malware techniques (see Non-Patent Documents 1 to 3). Pattern matching is most typically a method of determining the reliability of a computer program based on whether or not the target computer program contains a specific byte string extracted from known malware. Pattern matching has the advantage of being less likely to falsely determine that malware is trustworthy compared to other techniques.

その他の既知のマルウェア対策の手法のひとつとして、振る舞い検知がある（非特許文献４～５参照）。これは、対象のコンピュータプログラムを実行し、その振る舞い（プログラムを仮想的にあるいは実際に動作させることにより得られる動的な特徴）からコンピュータプログラムの信頼性を判定する手法である。振る舞い検知は、パターンマッチングと異なり、既知のマルウェアの亜種にも対応できるという利点が存在する。 One of other known anti-malware techniques is behavior detection (see Non-Patent Documents 4 and 5). This is a technique of executing a target computer program and judging the reliability of the computer program from its behavior (dynamic characteristics obtained by virtually or actually operating the program). Behavior detection, unlike pattern matching, has the advantage of being able to handle known malware variants.

Bewar Neamat Taha andCihan Varol," Pattern Matching Based Malware Identification", International Journal of Scientific and Engineering Research 11(8):1375-1381,［online］,［令和3年10月11日検索］,インターネット＜ＵＲＬ：https://www.researchgate.net/publication/345141000_Pattern_Matching_Based_Malware_Identification＞Bewar Neamat Taha and Cihan Varol, "Pattern Matching Based Malware Identification", International Journal of Scientific and Engineering Research 11(8):1375-1381, [online], [searched on October 11, 2021], Internet <URL: https://www.researchgate.net/publication/345141000_Pattern_Matching_Based_Malware_Identification＞ Keisuke Sugahara, Yasushi Sengoku, Hitoshi Yashima, Masashi Jige, Hiroyuki Nishikawa, Eiji Okamoto, "A Study of Unknown Computer-Virus Detection Methods ", The 2004 Symposium on Cryptography and Information Security Sendai, Japan, Jan.27-30, 2004 The Institute of Electronics, Information and Communication Engineers, ［online］,［令和3年10月11日検索］,インターネット＜ＵＲＬ：https://www.ipa.go.jp/security/fy15/reports/uvd/documents/3B5-3.pdf＞Keisuke Sugahara, Yasushi Sengoku, Hitoshi Yashima, Masashi Jige, Hiroyuki Nishikawa, Eiji Okamoto, "A Study of Unknown Computer-Virus Detection Methods ", The 2004 Symposium on Cryptography and Information Security Sendai, Japan, Jan.27-30, 2004 The Institute of Electronics, Information and Communication Engineers, [online], [searched on October 11, 2021], Internet <URL: https://www.ipa.go.jp/security/fy15/reports/uvd/documents /3B5-3.pdf＞ "未知ウイルス検出技術に関する調査",独立行政法人情報処理推進機構,2004年4月,［online］,［令和3年10月11日検索］,インターネット＜ＵＲＬ：https://www.ipa.go.jp/security/fy15/reports/uvd/documents/3B5-3.pdf＞"Survey on Unknown Virus Detection Technology", Information-technology Promotion Agency, April 2004, [online], [searched on October 11, 2021], Internet <URL: https://www.ipa. go.jp/security/fy15/reports/uvd/documents/3B5-3.pdf＞ Jaehyeong Lee,他6名，"リソースアクセス情報に基づく未知のマルウェア検知手法",第11回情報科学技術フォーラム,2012年,［online］,［令和３年１０月１１日検索］,インターネット＜ＵＲＬ：http://id.nii.ac.jp/1001/00151832/＞Jaehyeong Lee, 6 others, "An unknown malware detection method based on resource access information", 11th Information Science and Technology Forum, 2012, [online], [searched on October 11, 2021], Internet < URL : http://id.nii.ac.jp/1001/00151832/＞松木隆宏，他３名，"セキュリティ無効化攻撃を利用したマルウェアの検知と活動抑止手法の提案"，情報処理学会論文誌，(2009年9月) Vol. 50, No.9, 2127-2136［online］，［令和3年10月11日検索］，インターネット＜ＵＲＬ：http://id.nii.ac.jp/1001/00066466/＞Takahiro Matsuki, 3 others, "Proposal of Malware Detection and Activity Deterrence Method Using Security Disabling Attack", Transactions of Information Processing Society of Japan, (September 2009) Vol. 50, No.9, 2127-2136 [ online], [searched on October 11, 2021], Internet <URL: http://id.nii.ac.jp/1001/00066466/>

しかし、パターンマッチングの場合、既知のマルウェアの一部を変更して作成された、既知のマルウェアの亜種の信頼性については、正しく判定できないという欠点が存在する。 However, pattern matching has the disadvantage that it cannot correctly determine the reliability of subspecies of known malware created by modifying a part of known malware.

また、振る舞い検知の場合、コンピュータプログラムの信頼性を判定する際に条件として使用する振る舞い自体は、事前に既知のマルウェアから振る舞いを抽出する必要がある。したがって、振る舞い検知には、マルウェアが未知の振る舞いにより悪意ある挙動を実現する場合、コンピュータプログラムの信頼性について正しく判定できないという欠点が存在する。 Also, in the case of behavior detection, the behavior itself used as a condition when judging the reliability of a computer program needs to be extracted from known malware in advance. Behavioral detection therefore has the drawback of not being able to correctly determine the trustworthiness of a computer program if the malware implements malicious behavior with unknown behavior.

このような課題に鑑み、本発明の目的の一つは、コンピュータプログラムの信頼性を判定する新たな手法を提供することである。 In view of such problems, one object of the present invention is to provide a new technique for determining the reliability of a computer program.

本発明の一実施形態によれば、コンピュータプログラムを取得する取得部と、取得された前記コンピュータプログラムから少なくとも一つの識別子を抽出する抽出部と、抽出された少なくとも一つの識別子に基づいて少なくとも一つの難読化率を生成する難読化率生成部と、生成された前記少なくとも一つの難読化率に基づいて前記コンピュータプログラムの信頼性を判定する信頼性判定部と、を含む、コンピュータプログラム信頼性判定システムが提供される。 According to one embodiment of the present invention, an obtaining unit for obtaining a computer program, an extracting unit for extracting at least one identifier from the obtained computer program, and based on the extracted at least one identifier, at least one A computer program reliability determination system, comprising: an obfuscation rate generation unit that generates an obfuscation rate; and a reliability determination unit that determines reliability of the computer program based on the generated at least one obfuscation rate. is provided.

本発明の一実施形態のコンピュータプログラム信頼性判定システムにおいて、前記難読化率生成部は、前記識別子を、複数の文字を含む部分文字列として分割し、分割された前記部分文字列の集合およびあらかじめ設定された文字区分に基づいて第１不一致率情報を算出し、あらかじめ記憶された複数の規則のうち一つの規則を設定し、設定された前記規則に基づいて第２不一致率情報を算出し、前記第１不一致率情報および前記第２不一致率情報を用いて前記難読化率を生成してもよい。 In the computer program reliability determination system according to one embodiment of the present invention, the obfuscation rate generator divides the identifier into substrings containing a plurality of characters, sets the divided substrings, and calculating first mismatch rate information based on the set character division, setting one rule out of a plurality of rules stored in advance, calculating second mismatch rate information based on the set rule, The obfuscation rate may be generated using the first mismatch rate information and the second mismatch rate information.

本発明の一実施形態のコンピュータプログラム信頼性判定システムにおいて、前記難読化率生成部は、分割された前記部分文字列を前記文字区分に基づいて区分化し、区分化された前記部分文字列において隣接する文字の文字区分を比較して前記第１不一致率情報を算出してもよい。 In the computer program reliability determination system according to one embodiment of the present invention, the obfuscation rate generator segments the divided character substrings based on the character segmentation, and The first non-coincidence rate information may be calculated by comparing the character divisions of the characters.

本発明の一実施形態のコンピュータプログラム信頼性判定システムにおいて、前記難読化率生成部は、前記識別子を構成する文字の集合、および前記文字の集合におけるそれぞれの文字の出現頻度に応じて前記規則を設定し、前記識別子が、所定の条件を満たす規則により生成される可能性が高いとき、前記難読化率が高くなってもよい。 In the computer program trustworthiness determination system according to one embodiment of the present invention, the obfuscation rate generator generates the rules according to a set of characters that make up the identifier and the appearance frequency of each character in the set of characters. If set, the obfuscation rate may be high when the identifier is likely to be generated by a rule that satisfies a predetermined condition.

本発明の一実施形態のコンピュータプログラム信頼性判定システムにおいて、前記少なくとも一つの識別子は、複数の識別子を含み、前記信頼性判定部は、前記複数の識別子の各々に対応する難読化率の代表値を算出し、算出された前記代表値と、あらかじめ設定された第１閾値との関係に基づいて前記コンピュータプログラムの前記信頼性を判定してもよい。 In one embodiment of the computer program trustworthiness determination system of the present invention, the at least one identifier includes a plurality of identifiers, and the trustworthiness determination unit determines a representative obfuscation rate corresponding to each of the plurality of identifiers. may be calculated, and the reliability of the computer program may be determined based on the relationship between the calculated representative value and a preset first threshold value.

本発明の一実施形態のコンピュータプログラム信頼性判定システムにおいて、前記少なくとも一つの識別子は、複数の識別子を含み、前記信頼性判定部は、前記複数の識別子のうちあらかじめ設定された第１閾値を超える難読化率を有する識別子の割合と、あらかじめ設定された第２閾値との関係に基づいて前記コンピュータプログラムの前記信頼性を判定してもよい。 In one embodiment of the computer program reliability determination system of the present invention, the at least one identifier includes a plurality of identifiers, and the reliability determination unit exceeds a preset first threshold among the plurality of identifiers. The reliability of the computer program may be determined based on a relationship between a ratio of identifiers having an obfuscation rate and a preset second threshold.

本発明の一実施形態のコンピュータプログラム信頼性判定システムにおいて、前記信頼性判定部は、前記難読化率とあらかじめ設定された第１閾値との差分に基づいて前記コンピュータプログラムの前記信頼性を判定してもよい。 In one embodiment of the computer program reliability determination system of the present invention, the reliability determination unit determines the reliability of the computer program based on the difference between the obfuscation rate and a preset first threshold. may

本発明の一実施形態のコンピュータプログラム信頼性判定システムにおいて、前記信頼性判定部は、前記識別子の数に基づいて前記コンピュータプログラムの信頼性を判定してもよい。 In one embodiment of the computer program reliability determination system of the present invention, the reliability determination section may determine the reliability of the computer program based on the number of identifiers.

また、本発明の一実施形態によれば、コンピュータプログラムを取得し、取得された前記コンピュータプログラムから少なくとも一つの識別子を抽出し、抽出された少なくとも一つの識別子に基づいて少なくとも一つの難読化率を生成し、生成された前記難読化率に基づいて前記コンピュータプログラムの信頼性を判定することを含むコンピュータプログラム信頼性判定方法が提供される。 Also, according to an embodiment of the present invention, obtaining a computer program, extracting at least one identifier from the obtained computer program, and calculating at least one obfuscation rate based on the extracted at least one identifier. A method for determining computer program trustworthiness is provided including generating and determining trustworthiness of the computer program based on the generated obfuscation rate.

上記コンピュータプログラム信頼性判定方法において、前記識別子を、複数の文字を含む部分文字列として分割し、分割された前記部分文字列の集合およびあらかじめ設定された文字区分に基づいて第１不一致率情報を算出し、あらかじめ記憶された複数の規則のうち一つの規則を設定し、設定された前記規則に基づいて第２不一致率情報を算出し、前記第１不一致率情報および前記第２不一致率情報を用いて前記難読化率を生成してもよい。 In the above computer program reliability determination method, the identifier is divided into substrings containing a plurality of characters, and the first mismatch rate information is calculated based on a set of the divided substrings and a preset character classification. setting one rule out of a plurality of pre-stored rules, calculating second mismatch rate information based on the set rule, and calculating the first mismatch rate information and the second mismatch rate information may be used to generate the obfuscation rate.

上記コンピュータプログラム信頼性判定方法において、分割された前記部分文字列を前記文字区分に基づいて区分化し、区分化された前記部分文字列において隣接する文字の文字区分を比較して前記第１不一致率情報を算出してもよい。 In the above computer program reliability determination method, the divided partial character strings are segmented based on the character segmentation, and the character segments of adjacent characters in the segmented partial character strings are compared to determine the first mismatch rate. Information may be calculated.

上記コンピュータプログラム信頼性判定方法において、前記識別子を構成する文字の集合、および前記文字の集合におけるそれぞれの文字の出現頻度に応じて前記規則を設定してもよい。 In the computer program reliability determination method described above, the rules may be set according to a set of characters forming the identifier and an appearance frequency of each character in the set of characters.

上記コンピュータプログラム信頼性判定方法において、前記少なくとも一つの識別子は、複数の識別子を含み、前記複数の識別子の各々に対応する難読化率の代表値を算出し、算出された前記代表値と、あらかじめ設定された第１閾値との関係に基づいて前記コンピュータプログラムの前記信頼性を判定してもよい。 In the above computer program reliability determination method, the at least one identifier includes a plurality of identifiers, a representative value of obfuscation rates corresponding to each of the plurality of identifiers is calculated, and the calculated representative value and The reliability of the computer program may be determined based on a relationship with a set first threshold.

上記コンピュータプログラム信頼性判定方法において、前記少なくとも一つの識別子は、複数の識別子を含み、前記複数の識別子のうちあらかじめ設定された第１閾値を超える難読化率を有する識別子の割合があらかじめ設定された第２閾値を超えるかどうかに基づいて前記コンピュータプログラムの前記信頼性を判定してもよい。 In the above computer program reliability determination method, the at least one identifier includes a plurality of identifiers, and a ratio of identifiers having an obfuscation rate exceeding a preset first threshold among the plurality of identifiers is preset. The trustworthiness of the computer program may be determined based on whether a second threshold is exceeded.

上記コンピュータプログラム信頼性判定方法において、前記難読化率とあらかじめ設定された第１閾値との差分に基づいて前記コンピュータプログラムの前記信頼性を判定してもよい。 In the computer program reliability determination method described above, the reliability of the computer program may be determined based on a difference between the obfuscation rate and a preset first threshold.

上記コンピュータプログラム信頼性判定方法において、前記識別子の数に基づいて前記コンピュータプログラムの信頼性を判定してもよい。 In the computer program reliability determination method described above, the reliability of the computer program may be determined based on the number of identifiers.

また、本発明の一実施形態によれば、コンピュータに、コンピュータプログラムを取得し、取得された前記コンピュータプログラムから少なくとも一つの識別子を抽出し、抽出された前記少なくとも一つの識別子に基づいて少なくとも一つの難読化率を生成し、生成された前記難読化率に基づいて前記コンピュータプログラムの信頼性を判定することを実行させる、コンピュータプログラム信頼性判定プログラムが提供される。 Further, according to an embodiment of the present invention, a computer obtains a computer program, extracts at least one identifier from the obtained computer program, and generates at least one identifier based on the extracted at least one identifier. A computer program trustworthiness determination program is provided that causes the execution of generating an obfuscation rate and determining the trustworthiness of the computer program based on the generated obfuscation rate.

本発明によれば、コンピュータプログラムに含まれる特定のパターンや振る舞いに依存せず、コンピュータプログラムプログラムの信頼性を判定する新たな手法を提供することができる。 According to the present invention, it is possible to provide a new technique for determining the reliability of a computer program without depending on specific patterns or behaviors contained in the computer program.

本発明の一実施形態に係るコンピュータプログラム信頼性判定システムの全体構成を示すブロック図である。1 is a block diagram showing the overall configuration of a computer program reliability determination system according to one embodiment of the present invention; FIG. 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムで実行される処理の流れの一例を示すフロー図である。FIG. 2 is a flow diagram showing an example of the flow of processing executed by the computer program reliability determination system according to one embodiment of the present invention; 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムで実行される処理の流れの一例を示すフロー図である。FIG. 2 is a flow diagram showing an example of the flow of processing executed by the computer program reliability determination system according to one embodiment of the present invention; 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムで実行される処理の流れの一例を示すフロー図である。FIG. 2 is a flow diagram showing an example of the flow of processing executed by the computer program reliability determination system according to one embodiment of the present invention; 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムで実行される処理の流れの一例を示すフロー図である。FIG. 2 is a flow diagram showing an example of the flow of processing executed by the computer program reliability determination system according to one embodiment of the present invention; 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムで実行される処理の流れの一例を示すフロー図である。FIG. 2 is a flow diagram showing an example of the flow of processing executed by the computer program reliability determination system according to one embodiment of the present invention; 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムで実行される処理の流れの一例を示すフロー図である。FIG. 2 is a flow diagram showing an example of the flow of processing executed by the computer program reliability determination system according to one embodiment of the present invention; 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムで実行される処理の流れの一例を示すフロー図である。FIG. 2 is a flow diagram showing an example of the flow of processing executed by the computer program reliability determination system according to one embodiment of the present invention; 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムにおける識別子の文字集合のデータ構造の一例である。It is an example of the data structure of the character set of the identifier in the computer program reliability determination system according to one embodiment of the present invention. 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムにおける命名規則のデータ構造の一例である。It is an example of a data structure of naming rules in the computer program reliability determination system according to one embodiment of the present invention. 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムにおける命名規則のデータ構造の一例である。It is an example of a data structure of naming rules in the computer program reliability determination system according to one embodiment of the present invention. 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムで実行される処理の流れの一例を示すフロー図である。FIG. 2 is a flow diagram showing an example of the flow of processing executed by the computer program reliability determination system according to one embodiment of the present invention; 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムで実行される処理の流れの一例を示すフロー図である。FIG. 2 is a flow diagram showing an example of the flow of processing executed by the computer program reliability determination system according to one embodiment of the present invention; 本発明の一実施形態に係るコンピュータプログラム信頼性判定システムで実行される処理の流れの一例を示すフロー図である。FIG. 2 is a flow diagram showing an example of the flow of processing executed by the computer program reliability determination system according to one embodiment of the present invention;

以下、本発明の実施の形態を、図面等を参照しながら説明する。但し、本発明は多くの異なる態様で実施することが可能であり、以下に例示する実施の形態の記載内容に限定して解釈されるものではない。図面は説明をより明確にするため、模式的に表される場合があるが、あくまで一例であって、本発明の解釈を限定するものではない。また、各要素に対する「第１」、「第２」と付記された文字は、各要素を区別するために用いられる便宜的な標識であり、特段の説明がない限りそれ以上の意味を有さない。なお、本実施形態で参照する図面において、同一部分または同様な機能を有する部分には同一の符号または類似の符号（数字ｘｘｘにＡ，Ｂまたは－１，－２などを付しただけの符号）を付し、その繰り返しの説明は省略する場合がある。また、構成の一部が図面から省略されたりする場合がある。その他、本発明の属する分野における通常に知識を有する者であれば認識できるものである場合、特段の説明を行わないものとする。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the present invention can be implemented in many different aspects and should not be construed as being limited to the description of the embodiments exemplified below. Although the drawings may be represented schematically in order to make the description clearer, they are only examples and do not limit the interpretation of the present invention. In addition, the letters "first" and "second" for each element are labels for convenience used to distinguish each element, and unless otherwise specified, have more meaning. do not have. In the drawings referred to in this embodiment, identical parts or parts having similar functions are denoted by the same reference numerals or similar reference numerals (numerals xxx plus A, B or -1, -2, etc.). , and repeated explanations may be omitted. Also, part of the configuration may be omitted from the drawing. In addition, no particular description will be given if it is something that can be recognized by a person who has ordinary knowledge in the field to which the present invention belongs.

＜第１実施形態＞
本発明の一実施形態に係るコンピュータプログラム信頼性判定システムについて、図面を参照しながら詳細に説明する。 <First embodiment>
A computer program reliability determination system according to an embodiment of the present invention will be described in detail with reference to the drawings.

（１－１．コンピュータプログラム信頼性判定システムの構成）
図１に、コンピュータプログラム信頼性判定システム１００の全体構成を示すブロック図を示す。図１に示すように、コンピュータプログラム信頼性判定システム１００は、制御部１１０、記憶部１２０、およびユーザインタフェース部１３０を含む。本実施形態では、コンピュータプログラム信頼性判定システム１００は、一つの端末（コンピュータ）で実施されている。 (1-1. Configuration of computer program reliability determination system)
FIG. 1 shows a block diagram showing the overall configuration of a computer program reliability determination system 100. As shown in FIG. As shown in FIG. 1, the computer program reliability determination system 100 includes a control section 110, a storage section 120, and a user interface section . In this embodiment, the computer program reliability determination system 100 is implemented in one terminal (computer).

制御部１１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍａｂｌｅＧａｔｅＡｒｒａｙ）、またはその他の演算処理回路、並びにＲＯＭ（ＲｅａｄｏｎｌｙＭｅｍｏｒｙ）およびＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）を有するメモリを含む。制御部１１０は、メモリに含まれたコンピュータプログラム信頼性判定プログラムを用いて各部の機能を制御する。 The control unit 110 includes a CPU (Central Processing Unit), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array), or other arithmetic processing circuits, ROM (Read only Memory) and RAM (Rando). m Access Memory) memory. The control unit 110 controls functions of each unit using a computer program reliability determination program stored in the memory.

記憶部１２０には、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の半導体メモリのほか、磁気記録媒体（磁気テープ、磁気ディスク等）、光記録媒体、光磁気記録媒体、記憶媒体である記憶可能な素子が用いられる。記憶部１２０は、コンピュータプログラム信頼性判定プログラムで用いられる各種情報を記憶するデータベースとしての機能を有する。なお、コンピュータプログラム信頼性判定プログラムは、コンピュータにより実行可能であればよく、上述のコンピュータ読み取り可能な記録媒体に記憶した状態で提供されてもよい。 The storage unit 120 uses a semiconductor memory such as an SSD (Solid State Drive), as well as a magnetic recording medium (magnetic tape, magnetic disk, etc.), an optical recording medium, a magneto-optical recording medium, and a memory element that is a storage medium. be done. The storage unit 120 has a function as a database that stores various information used in the computer program reliability determination program. The computer program reliability determination program may be provided as long as it can be executed by a computer, and stored in the computer-readable recording medium described above.

記憶部１２０のコンピュータプログラム信頼性判定プログラム用のデータベースには、コンピュータプログラム１２０１、文字区分１２０３、命名規則１２０５、および閾値１２０７等の各情報を含む。各情報については後述する。 The database for the computer program reliability determination program in the storage unit 120 includes information such as the computer program 1201, character division 1203, naming rule 1205, threshold value 1207, and the like. Each piece of information will be described later.

ユーザインタフェース部１３０は、ユーザとの間で情報の入出力を行う。ユーザインタフェース部１３０は、表示部１３１および操作部１３３を含む。表示部１３１は、液晶ディスプレイまたは有機ＥＬ（ＥｌｅｃｔｒｏＬｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイなどの表示デバイスであって、制御部１１０から入力される信号により表示内容が制御される。操作部１３３は、コントローラー、ボタン、またはスイッチを含む。ユーザから操作部１３３を用いて上下左右への移動、押圧、または回転などの動作がなされることにより、その動作に基づく情報が制御部１１０に入力される。なお、ユーザインタフェース部１３０は、タッチセンサを有する表示装置（タッチパネル）であれば、表示部１３１と操作部１３３とが、同じ場所に配置されてもよい。 The user interface unit 130 inputs and outputs information with the user. User interface portion 130 includes a display portion 131 and an operation portion 133 . The display unit 131 is a display device such as a liquid crystal display or an organic EL (Electro Luminescence) display, and display contents are controlled by signals input from the control unit 110 . Operation unit 133 includes a controller, buttons, or switches. When the user uses the operation unit 133 to move, press, or rotate, information based on the operation is input to the control unit 110 . Note that the display unit 131 and the operation unit 133 of the user interface unit 130 may be arranged at the same place as long as the display device (touch panel) has a touch sensor.

図１において、制御部１１０は、機能部として取得部１１０１、識別子抽出部１１０３、難読化率生成部１１０５、信頼性判定部１１０７を含む。 In FIG. 1, the control unit 110 includes an acquisition unit 1101, an identifier extraction unit 1103, an obfuscation rate generation unit 1105, and a reliability determination unit 1107 as functional units.

取得部１１０１は、ユーザから入力されたコンピュータプログラムまたは記憶部１２０に記憶されたコンピュータプログラムを取得する機能を有する。本実施形態において、コンピュータプログラムは、実行ファイル、ソースコード、およびデバッグ情報等を含む、識別子を抽出することが可能の単位全般を指す。コンピュータプログラムは、Ｃ／Ｃ＋＋，Ｐｙｔｈｏｎ，ＪａｖａＳｃｒｉｐｔ，Ｊａｖａ、またはＶＢＡ（ＶｉｓｕａｌＢａｓｉｃｆｏｒＡｐｐｌｉｃａｔｉｏｎｓ）等のプログラミング言語で生成されている。 Acquisition unit 1101 has a function of acquiring a computer program input by a user or a computer program stored in storage unit 120 . In this embodiment, the computer program refers to all units from which identifiers can be extracted, including executable files, source code, debug information, and the like. Computer programs are written in programming languages such as C/C++, Python, JavaScript, Java, or VBA (Visual Basic for Applications).

識別子抽出部１１０３は、取得された前記コンピュータプログラムから識別子を抽出する機能を有する。 The identifier extraction unit 1103 has a function of extracting an identifier from the acquired computer program.

ここで、攻撃者がソースコード中で使用する識別子として、自然言語として解釈不可能の文字列を使用する場合、最も多用される手法のひとつとして、特定の文字の集合、典型的には英数字から無作為に複数の文字を選択する手法がある。本実施形態において、このような手法で得られた、可読性を有しない識別子のことを「難読識別子」と呼ぶ。難読識別子を通常のソフトウェア開発で使用される識別子を比較した場合、識別子に含まれる各文字と対応する文字区分が異なる傾向を示す。この傾向の差異を利用し、識別子を自然言語として解釈する困難性を示す指標値を、「難読化率」と呼ぶ。本実施形態では、難読識別子を説明するとき、対象となるコンピュータプログラムの識別子として文字列「ｆ２ＬＰｉ７ＲＺｈｖ」が抽出されるとする。 Here, when an attacker uses a character string that cannot be interpreted as a natural language as an identifier used in the source code, one of the most frequently used techniques is a set of specific characters, typically alphanumeric characters. There is a technique to select multiple characters at random from . In the present embodiment, an unreadable identifier obtained by such a method is called an "obfuscated identifier". When obfuscated identifiers are compared with identifiers used in normal software development, each character included in the identifier and the corresponding character segment tend to be different. Using this difference in tendency, an index value indicating the difficulty of interpreting an identifier as a natural language is called an "obfuscation rate." In the present embodiment, when explaining the obfuscated identifier, it is assumed that the character string "f2LPi7RZhv" is extracted as the identifier of the target computer program.

難読化率生成部１１０５は、抽出された識別子に基づいて難読化率を生成する機能を有する。 The obfuscation rate generator 1105 has a function of generating an obfuscation rate based on the extracted identifier.

信頼性判定部１１０７は、生成された難読化率に基づいてコンピュータプログラムの信頼性を判定する機能を有する。 The reliability determination unit 1107 has a function of determining the reliability of the computer program based on the generated obfuscation rate.

（１－２．コンピュータプログラム性判定制御処理）
次に、コンピュータプログラム信頼性判定システム１００におけるコンピュータプログラム信頼性判定処理について図面を用いて詳細に説明する。図２は、コンピュータプログラム信頼性判定システム１００におけるコンピュータプログラム信頼性判定処理の一例を示すフロー図である。 (1-2. Computer Programmability Determination Control Processing)
Next, computer program reliability determination processing in the computer program reliability determination system 100 will be described in detail with reference to the drawings. FIG. 2 is a flowchart showing an example of computer program reliability determination processing in the computer program reliability determination system 100. As shown in FIG.

（１－２－１．コンピュータプログラムの取得）
まず、制御部１１０の取得部１１０１は、信頼性を判定する対象となるコンピュータプログラムを取得する（Ｓ１１０）。本実施形態において、コンピュータプログラムは、ユーザから入力されるか、記憶部１２０に記憶されたコンピュータプログラム信頼性判定プログラム用のデータベースから取得される。 (1-2-1. Acquisition of computer program)
First, the acquisition unit 1101 of the control unit 110 acquires a computer program whose reliability is to be determined (S110). In this embodiment, the computer program is input by the user or obtained from a database for a computer program reliability determination program stored in the storage unit 120 .

（１－２－２．識別子の抽出）
次に、制御部１１０の識別子抽出部１１０３は、取得したコンピュータプログラムに含まれる識別子１２０２を抽出する（Ｓ１２０）。本実施形態では、対象のコンピュータプログラムがスクリプト言語で記述されている場合、構文解析によりソースコードに含まれる字句を演算子や識別子等に分類することで、識別子１２０２を抽出することができる。例えばＶＢＡの場合、Ｄｉｍステートメントに識別子を後置することにより変数を宣言するため、ソースコード中からＤｉｍステートメントを検索することにより識別子１２０２を抽出することができる。この例では、対象となるコンピュータプログラムの識別子として文字列「ｆ２ＬＰｉ７ＲＺｈｖ」が抽出される。 (1-2-2. Identifier extraction)
Next, the identifier extraction unit 1103 of the control unit 110 extracts the identifier 1202 included in the acquired computer program (S120). In this embodiment, if the target computer program is written in a script language, the identifier 1202 can be extracted by classifying tokens included in the source code into operators, identifiers, and the like by parsing. For example, in VBA, a variable is declared by appending an identifier to a Dim statement, so the identifier 1202 can be extracted by searching for the Dim statement in the source code. In this example, the character string "f2LPi7RZhv" is extracted as the identifier of the target computer program.

（１－２－３．難読化率の生成）
次に、制御部１１０の難読化率生成部１１０５は、識別子の難読化率を生成する（Ｓ１３０）。難読化率の生成方法について、以下に詳細に説明する。 (1-2-3. Generation of obfuscation rate)
Next, the obfuscation rate generation unit 1105 of the control unit 110 generates an obfuscation rate for the identifier (S130). The method of generating the obfuscation rate is described in detail below.

図３は、難読化率を生成するための処理フローＳ１３０である。図３に示すように、難読化率生成部１１０５は、区分不一致率（第１不一致率情報ともいう）の生成（Ｓ１３１）、基準区分不一致率（第２不一致率情報ともいう）の生成（Ｓ１３３）、および難読化率の生成（Ｓ１３５）を含む。なお、基準区分不一致情報の生成は、区分不一致の生成よりも先に行われてもよいし、又はあらかじめ行われてもよい。 FIG. 3 is a process flow S130 for generating an obfuscation rate. As shown in FIG. 3, the obfuscation rate generator 1105 generates a section mismatch rate (also referred to as first mismatch rate information) (S131), generates a reference section mismatch rate (also referred to as second mismatch rate information) (S133 ), and generating an obfuscation rate (S135). Note that the generation of the reference classification mismatch information may be performed prior to the generation of the classification mismatch, or may be performed in advance.

（１－２－３－１．区分不一致率の生成）
図４～図７は、区分不一致率を生成するための処理フローＳ１３１の一例である。区分不一致率とは、任意の識別子と対応する値であり、文字区分と識別子を構成する文字に基づき計算される。図４および図５に示すように、まず、文字列である識別子を部分文字列に分割する（Ｓ１３１１）。本実施形態では、識別子１２０２を、隣接する２文字の部分文字列１２０２ａに分割し、分割された部分文字列から構成される集合１２０２ａｇを得る。具体的には、図５に示すように、識別子１２０２である文字列「ｆ２ＬＰｉ７ＲＺｈｖ」が、部分文字列１２０２ａ「ｆ２」、「２Ｌ」、「ＬＰ」、「Ｐｉ」、「ｉ７」、「７Ｒ」、「ＲＺ」「Ｚｈ」、「ｈｖ」に分割される。 (1-2-3-1. Generation of segment mismatch rate)
4 to 7 are an example of the processing flow S131 for generating segment mismatch rates. A segment mismatch rate is a value associated with an arbitrary identifier and is calculated based on the character segments and the characters that make up the identifier. As shown in FIGS. 4 and 5, first, the identifier, which is a character string, is divided into partial character strings (S1311). In this embodiment, the identifier 1202 is divided into adjacent two-character substrings 1202a to obtain a set 1202ag composed of the divided substrings. Specifically, as shown in FIG. 5, the character string “f2LPi7RZhv”, which is the identifier 1202, is replaced by subcharacter strings 1202a “f2”, “2L”, “LP”, “Pi”, “i7”, “7R”. , “RZ”, “Zh”, and “hv”.

次に、図４および図６に示すように、部分文字列を文字区分にしたがって区分化する（Ｓ１３１３）。文字区分１２０３のデータ構造は、図６に示される。この例では、文字区分１２０３は、英字小文字からなる第１文字区分１２０３－１、英字大文字からなる第２文字区分１２０３－２、数字からなる第３文字区分１２０３－３を含む。部分文字列の集合１２０２ａｇに含まれる部分文字列１２０２ａ－１～９は、該当する文字区分に分割される。具体的には、部分文字列１２０２ａ－１「ｆ２」の場合、「ｆ」が第１文字区分である「英字小文字」に区分化されて、「２」は第３文字区分である「数字」に区分化される。これにより、区分化された部分文字列１２０２ｂ－１が生成される。部分文字列１２０２ａ－２～９についても同様に区分化される。 Next, as shown in FIGS. 4 and 6, the partial character strings are segmented according to character segmentation (S1313). The data structure of character segment 1203 is shown in FIG. In this example, character segments 1203 include a first character segment 1203-1 consisting of lowercase alphabetic characters, a second character segment 1203-2 consisting of uppercase alphabetic characters, and a third character segment 1203-3 consisting of digits. Substrings 1202a-1 to 1202a-9 included in substring set 1202ag are divided into corresponding character segments. Specifically, in the case of the partial character string 1202a-1 "f2", "f" is segmented into the first character segment "lowercase letters", and "2" is segmented into the third character segment "numeric". are compartmentalized into This produces a segmented substring 1202b-1. Substrings 1202a-2 to 1202a-9 are also segmented in the same manner.

次に、図４および図７に示すように、区分化された部分文字列１２０２ｂ－１～９において、隣接する文字の文字区分が一致するかを比較する（Ｓ１３１５）。区分化された部分文字列が「英字小文字」と「数字」のように、異なる文字区分で構成される場合、区分化された部分文字列は「不一致」であると判定される。区分化された部分文字列が「英字大文字」と「英字大文字」のように同じ文字区分で構成される場合、区分化された部分文字列は「一致」であると判定される。この方法によれば、図７に示すように、上述の区分化された部分文字列の比較判定結果は、判定結果１２０２ｃ－１～９となる。 Next, as shown in FIGS. 4 and 7, in the segmented partial character strings 1202b-1 to 1202b-9, comparison is made to see if the character segments of adjacent characters match (S1315). If the segmented substring consists of different character segments, such as "lowercase alphabetic" and "numeric", the segmented substring is determined to be "mismatched". If the segmented substring consists of the same character segments, such as "alphabetic uppercase" and "alphabetic uppercase", the segmented substring is determined to be "matched". According to this method, as shown in FIG. 7, the results of comparison and determination of the segmented partial character strings are determination results 1202c-1 to 1202c-9.

最後に、図４および図７に示すように、得られた判定結果１２０２ｃ－１～９をもとに区分一致率（第１不一致率）Ｄが算出される（Ｓ１３１７）。この場合、区分不一致率Ｄ＝６／９＝０．６６６と算出される。 Finally, as shown in FIGS. 4 and 7, a segment matching rate (first non-matching rate) D is calculated based on the obtained determination results 1202c-1 to 1202c-9 (S1317). In this case, the division mismatch rate D=6/9=0.666.

区分不一致率をより一般化した場合には、以下のように求められる。
文字列Ｓを構成する文字の数を「ｎ」とする。
「Ｓ」のｉ番目の文字を「Ｓ_ｉ」とする。

文字区分全体を「Ｃ」とする。
「Ｃ」を構成する文字区分の数を「ｍ」とする。
「Ｃ」のｉ番目の文字区分を「Ｃ_ｉ」とする。

文字Ｓ_ｉと対応する文字区分を「ｆ（Ｓ_ｉ）」とする。
「Ｓ」を、隣接する２文字の文字列に分割し、分割された部分文字列から構成される集合をＰとする。

Ｐのｉ番目の文字の組をＰ_ｉとする。このとき、

である。
文字の組Ｐ_ｉのうち、１番目の要素をＰ_ｉ´，２番目の要素をＰ_ｉｉ´とする。
このとき、以下の関数ｇが用いられる。

このとき、区分不一致率Ｄは数式１のように生成される。

If the classification mismatch rate is more generalized, it is obtained as follows.
Let the number of characters constituting the character string S be “n”.
Let the i-th character of "S" be "S _i ".

Let the entire character segment be "C".
Let "m" be the number of character segments that make up "C".
Let the i-th character segment of "C" be "C _i ".

Let the character segment corresponding to the character S _i be “f(S _i )”.
Let "S" be divided into strings of two adjacent characters, and let P be a set composed of the divided substrings.

Let P _i be the i-th character set of P. At this time,

is.
Let P _{i '} be the first element of the character set P _i and P _ii ' be the second element.
At this time, the following function g is used.

At this time, the segment mismatch rate D is generated as shown in Equation 1.

（１－２－３－２．基準区分不一致率情報の生成）
図８に示された基準区分不一致率情報（第２不一致率情報）の生成について説明する。まず、記憶部１２０にあらかじめ記憶された複数の識別子（記憶済み識別子グループともいう）に対応する命名規則を設定する（Ｓ１３３１）。記憶済み識別子グループに含まれる各々の識別子は、難読識別子に限定される。命名規則を設定するため、適合する文字の集合があらかじめ設定されたデータ構造から選択される。図９は、文字集合のデータ構造９００の一例である。文字集合のデータ構造９００は、分類番号９０１および文字の集合分類９０３を含む。図９には、分類１（英字小文字から構成される文字の集合）、分類２（英字大文字から構成される文字の集合）、分類３（数字から構成される文字の集合）、分類４（英字小文字・英字大文字から構成される文字の集合）、分類５（英字大文字・数字から構成される文字の集合）、分類６（数字・英字小文字から構成される文字の集合）、分類７（英字小文字・英字大文字・数字から構成される文字の集合）が含まれる。例えば、記憶済み識別子グループに対応する文字集合の分類として、「分類７，英字小文字・英字大文字・数字から構成される文字の集合」が設定される。 (1-2-3-2. Generation of reference segment mismatch rate information)
Generation of the reference section mismatch rate information (second mismatch rate information) shown in FIG. 8 will be described. First, a naming rule corresponding to a plurality of identifiers (also referred to as a stored identifier group) pre-stored in storage unit 120 is set (S1331). Each identifier included in the stored identifier group is restricted to obfuscated identifiers. To set the naming convention, a matching set of characters is selected from a preconfigured data structure. FIG. 9 is an example of a character set data structure 900 . Character set data structure 900 includes classification number 901 and character set classification 903 . FIG. 9 shows classification 1 (a set of characters consisting of lowercase letters), classification 2 (a set of characters consisting of uppercase letters), classification 3 (a set of characters consisting of numbers), and classification 4 (a set of letters). Category 5 (set of characters consisting of uppercase letters and numbers), Category 6 (set of characters consisting of numbers and lowercase letters), Category 7 (lowercase letters)・A set of characters consisting of uppercase letters and numbers) is included. For example, as the classification of the character set corresponding to the stored identifier group, "class 7, character set consisting of lower case alphabetic characters, upper case alphabetic characters and numerals" is set.

次に、上述で設定された文字の集合において用いられているそれぞれの文字の出現頻度を求める。図１０は、命名規則のデータ構造の一例である。文字の出現頻度のデータ構造９１０は、番号９１１、文字の集合９１３および生成される識別子の具体例９１５を含む。図１０に含まれる命名基規則はいずれも、対応する文字の集合９１３から文字を無作為に選択し、難読識別子を生成する。このとき、例えば記憶済み識別子グループに対応する文字集合の分類が「分類７，英字小文字・英字大文字・数字から構成される文字の集合」であるという情報から、記憶済み識別子グループに対応する命名規則として、命名規則７が選択される。なお、本実施形態において選択された命名規則を「難読命名規則」と呼ぶ。 Next, the appearance frequency of each character used in the set of characters set above is obtained. FIG. 10 is an example of a data structure of naming rules. The character frequency data structure 910 includes a number 911, a set of characters 913, and an instance 915 of the generated identifier. All of the nomenclature rules contained in FIG. 10 randomly select characters from the corresponding set of characters 913 to generate obfuscation identifiers. At this time, for example, based on the information that the classification of the character set corresponding to the stored identifier group is "class 7, a set of characters consisting of lowercase alphabetic characters, uppercase alphabetic characters, and numbers", a naming rule corresponding to the stored identifier group Naming rule 7 is chosen as . The naming rule selected in this embodiment is called an "obfuscated naming rule".

次に、上記で選択された難読命名規則に基づいて、基準区分不一致率（第２不一致率）Ｄ´を生成する（Ｓ１３３３）。基準区分不一致率Ｄ´とは、記憶済み識別子グループに含まれる識別子と対応する区分不一致率の収束値をいう。上述の難読命名規則に準拠した識別子について、任意の隣接する文字Ｓ_ｉとＳ_{（ｉ＋１）}について、Ｓ_ｉの文字区分ｆ（Ｓ_ｉ）とＳ_{（ｉ＋１）}の文字区分ｆ（Ｓ_{（ｉ＋１）}）における基準区分不一致率Ｄ´を求める。ここで、英字小文字は２６種類、英字大文字は２６種類、数字は１０種類である。 Next, based on the obfuscated naming rule selected above, a reference classification mismatch rate (second mismatch rate) D' is generated (S1333). The reference segment mismatch rate D' refers to the convergence value of the segment mismatch rate corresponding to the identifiers included in the stored identifier group. For identifiers that conform to the obfuscated naming conventions described above, for any adjacent characters S _i and S _(i+1) , the character division f(S _i ) of S _i and the character division f(S _(i+1) ) of S _(i+1) , the standard segment mismatch rate D' is obtained. Here, there are 26 types of lowercase letters, 26 types of uppercase letters, and 10 types of numerals.

Ｓ_ｉの文字区分が英字小文字である確率は、

となる。
Ｓ_ｉの文字区分が英字大文字である確率は

となる。
Ｓ_ｉの文字区分が数字である確率は

となる。
Ｓ_ｉの文字区分が英字小文字であり、かつＳ_{（ｉ＋１）}の文字区分が英字小文字以外になる確率は

となる。
Ｓ_ｉの文字区分が英字大文字であり、かつＳ_{（ｉ＋１）}の文字区分が英字大文字以外になる確率は

となる。
Ｓ_ｉの文字区分が数字であり、かつＳ_{（ｉ＋１）}の文字区分が数字以外になる確率は

となる。
Ｓ_ｉの文字区分とＳ_{（ｉ＋１）}の文字区分が不一致となる確率は

となる。
以上より、難読命名規則に準拠した識別子における基準区分不一致率（第２不一致率）Ｄ´は以下となる。

The probability that the character division of S _i is a lowercase letter is

becomes.
The probability that the character division of S _i is a capital letter is

becomes.
The probability that the character division of S _i is a number is

becomes.
The probability that the character division of S _i is a lowercase alphabetic character and that the character division of S _(i+1) is other than a lowercase alphabetic character is

becomes.
The probability that the character division of S _i is an uppercase alphabetic character and that the character division of S _(i+1) is not an uppercase alphabetic character is

becomes.
The probability that the character classification of S _i is numeric and the character classification of S _(i+1) is non-numeric is

becomes.
The probability that the character division of S _i and the character division of S _(i+1) do not match is

becomes.
Based on the above, the standard classification mismatch rate (second mismatch rate) D' for identifiers conforming to the obfuscated naming rule is as follows.

（１－２－３－３．難読化率の生成）
図３に戻って、難読化率Ｒを生成する（Ｓ１３５）。本実施形態では、難読化率Ｒは、区分不一致率Ｄおよび基準区分不一致率Ｄ´を用いた任意の式で定義される。識別子のそれぞれと対応する難読化率を計算する場合、例えば以下の数式２が用いられる。

数式２において、「ａｂｓ」は入力の絶対値を返す関数である。この式で定義される難読化率「Ｒ」は、対象の識別子の可読性が低いと「１」に近づき（難読化率が高く）、対象の識別子の可読性が高まるほど「０」に近づく（難読化率が低い）。また、難読化率「Ｒ」として他の式を用いてもよい。 (1-2-3-3. Generation of obfuscation rate)
Returning to FIG. 3, the obfuscation rate R is generated (S135). In this embodiment, the obfuscation rate R is defined by an arbitrary formula using the segment mismatch rate D and the reference segment mismatch rate D'. When calculating the obfuscation rate corresponding to each of the identifiers, for example, Equation 2 below is used.

In Equation 2, "abs" is a function that returns the absolute value of the input. The obfuscation rate “R” defined by this formula approaches “1” when the readability of the target identifier is low (high obfuscation rate), and approaches “0” as the readability of the target identifier increases (obfuscation conversion rate is low). Other formulas may also be used for the obfuscation rate "R".

ここで、難読化率の検証を行うため、識別子として（１）可読識別子が抽出された場合の難読化率、（２）難読識別子が抽出された場合の難読化率について説明する。 Here, in order to verify the obfuscation rate, (1) the obfuscation rate when the readable identifier is extracted as the identifier, and (2) the obfuscation rate when the obfuscation identifier is extracted will be described.

（１－２－３－４．可読識別子が抽出された場合の難読化率）
可読識別子は、通常のソフトウェア開発において使用される命名規則に基づき生成される。図１１は、可読命名規則のデータ構造９２０の一例である。図１１に示すように、可読命名規則のデータ構造は、番号９２１、可読命名規則名９２３、識別子の構成９２５、および生成される可読識別子の具体例９２７を含む。可読命名規則１（Upper Camel Case）は、各単語の先頭の文字を大文字、その他の英字を小文字とする命名規則である。可読命名規則２（Lower Camel Case）は、各単語の先頭の文字を大文字、その他の英字を小文字とする命名規則である。ただし、全体の開始文字のみ小文字とする命名規則である。可読命名規則３（Upper Snake Case）は、単語の区切りをアンダースコア(_)、その他の英字を大文字とする命名規則である。可読命名規則４（Lower Snake Case）は、単語の区切りをアンダースコア(_)、その他の英字を小文字とする命名規則である。可読命名規則５（Upper Case）は、全ての英字を大文字とし、単語の区切りを何も変更しない命名規則である。可読命名規則６（Lower Case）は、全ての英字を小文字とし、単語の区切りを何も変更しない命名規則である。 (1-2-3-4. Obfuscation rate when readable identifier is extracted)
Human readable identifiers are generated based on naming conventions used in normal software development. FIG. 11 is an example of a human readable naming convention data structure 920 . As shown in FIG. 11, the human readable naming convention data structure includes a number 921, a human readable naming convention name 923, an identifier configuration 925, and a human readable identifier instance 927 to be generated. Human readable naming rule 1 (Upper Camel Case) is a naming rule in which the first letter of each word is capitalized and the other alphabetic characters are capitalized. Human readable naming rule 2 (Lower Camel Case) is a naming rule in which the first letter of each word is capitalized and the other alphabetic characters are capitalized. However, it is a naming rule that only the initial letter of the whole is lowercase. Human readable naming rule 3 (Upper Snake Case) is a naming rule in which words are separated by underscores (_) and other alphabetic characters are capitalized. Human readable naming rule 4 (Lower Snake Case) is a naming rule in which words are separated by underscores (_) and other alphabetic characters are in lower case. Human readable naming rule 5 (Upper Case) is a naming rule in which all alphabetic characters are capitalized and word breaks are not changed. Human readable naming rule 6 (Lower Case) is a naming rule in which all alphabetic characters are lower case and no word separators are changed.

可読命名規則３（Upper Snake Case）および可読命名規則５（Upper Case）に準拠した可読識別子の場合、英字は全て英字大文字として表現されるため、区分不一致率Ｄが比較的低くなる傾向にある。また、可読命名規則４（Lower Snake Case）および可読命名規則６（Lower Case）に準拠した可読識別子の場合、英字は全て英字小文字として表現されるため、同様に区分不一致率Ｄが比較的低くなる傾向にある。また、可読命名規則１（Upper Camel Case）および可読命名規則２（Lower Camel Case）に準拠した可読識別子の場合、少なくとも各単語の先頭から2文字目以降の英字は全て英字小文字として表現されるため、同様に区分不一致率Ｄが比較的低くなる傾向にある。以上のことから、通常のソフトウェア開発において使用される可読命名規則のいずれかに準拠した可読識別子においては、区分不一致率Ｄが比較的低くなる傾向にあると言える。区分不一致率Ｄが低くなると、難読命名規則に基づいた基準区分不一致率Ｄ´との差が大きくなる。したがって、通常のソフトウェア開発において使用される可読命名規則のいずれかに準拠した可読識別子の場合、数式２において低い難読化率（「０」に近い値）を示すと言える。 In the case of readable identifiers conforming to readable naming rule 3 (Upper Snake Case) and readable naming rule 5 (Upper Case), all alphabetic characters are expressed as uppercase alphabetic characters, so the segment mismatch rate D tends to be relatively low. In addition, in the case of human readable identifiers conforming to human readable naming rule 4 (Lower Snake Case) and human readable naming rule 6 (Lower Case), since all alphabetic characters are expressed as lower case alphabetic characters, the classification mismatch rate D is relatively low as well. There is a tendency. In addition, in the case of human readable identifiers conforming to human readable naming rule 1 (Upper Camel Case) and human readable naming rule 2 (Lower Camel Case), at least the second letter from the beginning of each word is all represented as lower case letters. , and similarly the segment mismatch rate D tends to be relatively low. From the above, it can be said that the classification mismatch rate D tends to be relatively low in readable identifiers conforming to any of the readable naming conventions used in normal software development. As the segment mismatch rate D decreases, the difference from the standard segment mismatch rate D' based on the obfuscated naming rule increases. Therefore, it can be said that a readable identifier conforming to any of the readable naming conventions used in normal software development exhibits a low obfuscation rate (a value close to "0") in Equation 2.

（１－２－３－５．難読識別子が抽出された場合の難読化率）
難読識別子は、上述した難読命名規則に基づき生成される。難読識別子の場合、生成される文字列のうち任意の箇所において、各文字が選択される確率は等しい。そのため、難読識別子において、英字小文字あるいは英字大文字が必ず一定数連続するようなことはなく、難読識別子における区分不一致率は比較的高くなる傾向にあると言える。区分不一致率Ｄが高くなると、難読命名規則に基づいた基準区分不一致率Ｄ´との差が小さくなる。したがって、難読識別子の場合、数式２において高い難読化率（「１」に近い値）を示すと言える。 (1-2-3-5. Obfuscation rate when obfuscated identifier is extracted)
Obfuscated identifiers are generated based on the obfuscated naming conventions described above. For obfuscated identifiers, at any point in the generated string, each character has an equal probability of being selected. For this reason, obfuscated identifiers do not necessarily have a fixed number of consecutive lowercase or uppercase alphabetic characters, and it can be said that the classification mismatch rate of obfuscated identifiers tends to be relatively high. As the segment mismatch rate D increases, the difference from the standard segment mismatch rate D' based on the obfuscated naming rule becomes smaller. Therefore, in the case of an obfuscated identifier, it can be said that Equation 2 exhibits a high obfuscation rate (a value close to "1").

（１－２－４．信頼性の判定）
図２に戻って、制御部１１０の信頼性判定部１１０７は、算出された難読化率を用いてコンピュータプログラムの信頼性を判定する（Ｓ１４０）。図１２は、信頼性を判定する処理フローＳ１４０の一例である。図１２に示すように、まず、生成された難読化率とあらかじめ設定された第１閾値との比較を行う（Ｓ１４１）。第１閾値は、識別子に応じて適宜設定可能である。このとき、難読化率が、第１閾値を超える場合（Ｓ１４５；Ｙｅｓ）、コンピュータプログラムの信頼性が高いとする信頼性情報１が生成される（Ｓ１４７）。一方、難読化率が第１閾値を超えない（Ｓ１４５；Ｎｏ）、コンピュータプログラムの信頼性が低いとする信頼性情報２が生成される（Ｓ１４９）。以上により、コンピュータプログラム信頼性判定処理方法が終了となる。 (1-2-4. Reliability determination)
Returning to FIG. 2, the reliability determination unit 1107 of the control unit 110 determines the reliability of the computer program using the calculated obfuscation rate (S140). FIG. 12 is an example of a processing flow S140 for determining reliability. As shown in FIG. 12, first, the generated obfuscation rate is compared with a preset first threshold (S141). The first threshold can be appropriately set according to the identifier. At this time, if the obfuscation rate exceeds the first threshold (S145; Yes), reliability information 1 indicating that the computer program is highly reliable is generated (S147). On the other hand, if the obfuscation rate does not exceed the first threshold (S145; No), reliability information 2 is generated indicating that the computer program has low reliability (S149). The above completes the computer program reliability determination processing method.

したがって、本実施形態を用いることにより、コンピュータプログラムに含まれる任意の識別子を用いて、コンピュータプログラムの信頼性を判定することができる。この場合、コンピュータプログラムに含まれる特定のパターンや振る舞いに依存せず、コンピュータプログラムの信頼性を正確に判定することができる。 Therefore, by using the present embodiment, any identifier included in the computer program can be used to determine the reliability of the computer program. In this case, the reliability of the computer program can be accurately determined without depending on specific patterns or behaviors contained in the computer program.

なお、制御部１１０は、信頼性が低いと判定されたコンピュータプログラムに対して停止、削除、隔離などの処理を行ってもよい。または、信頼性が低いと判定されたコンピュータプログラムの情報は表示部１３１に表示されてもよい。 Note that the control unit 110 may perform processing such as stopping, deleting, or quarantining a computer program that has been determined to have low reliability. Alternatively, information on the computer program determined to be unreliable may be displayed on the display unit 131 .

＜第２実施形態＞
本実施形態では、第１実施形態と異なるコンピュータプログラム信頼性判定方法について説明する。具体的には、複数の識別子に対応する難読化率の代表値を用いて信頼性を判定する方法について説明する。なお、複数の識別子を抽出する方法、複数の識別子の各々の難読化率を生成する方法は、第１実施形態と同様であるため、その説明は省略する。 <Second embodiment>
In this embodiment, a computer program reliability determination method different from that in the first embodiment will be described. Specifically, a method of determining reliability using a representative obfuscation rate corresponding to a plurality of identifiers will be described. Note that the method of extracting a plurality of identifiers and the method of generating an obfuscation rate for each of the plurality of identifiers are the same as those in the first embodiment, so description thereof will be omitted.

図１３は、信頼性を判定する処理フローＳ１４０Ａの一例である。図１３に示すように、本実施形態では、複数の識別子の各々に対応する難読化率から代表値を算出してもよい（Ｓ１４２）。この場合の代表値には、例えば平均値または中央値が含まれる。次に、難読化率の代表値と閾値との比較を行う（Ｓ１４４）。このとき、難読化率の代表値が第１閾値を超える場合（Ｓ１４５Ａ；Ｙｅｓ）、コンピュータプログラムの信頼性が高いとする信頼性情報１が生成される（Ｓ１４７）。一方、難読化率の代表値が第１閾値を超えない（Ｓ１４５Ａ；Ｎｏ）、コンピュータプログラムの信頼性が低いとする信頼性情報２が生成される（Ｓ１４９）。 FIG. 13 is an example of a processing flow S140A for determining reliability. As shown in FIG. 13, in this embodiment, a representative value may be calculated from the obfuscation rate corresponding to each of a plurality of identifiers (S142). Representative values in this case include, for example, mean values or median values. Next, the representative value of the obfuscation rate is compared with the threshold value (S144). At this time, if the representative value of the obfuscation rate exceeds the first threshold (S145A; Yes), reliability information 1 is generated indicating that the computer program is highly reliable (S147). On the other hand, if the representative value of the obfuscation rate does not exceed the first threshold (S145A; No), reliability information 2 is generated indicating that the reliability of the computer program is low (S149).

本実施形態を用いることにより、コンピュータプログラムに含まれる複数の識別子を用いて、コンピュータプログラムの信頼性を判定することができる。これにより、コンピュータプログラムの信頼性をより正確に判定することができる。 By using this embodiment, the reliability of a computer program can be determined using a plurality of identifiers included in the computer program. This allows a more accurate determination of the trustworthiness of the computer program.

なお、本実施形態では、代表値として、平均値または中央値が用いられたが、本発明はこれに限定されない。代表値には、最大値または最小値が適宜設定されてもよい。 In this embodiment, the average value or the median value is used as the representative value, but the present invention is not limited to this. A maximum value or a minimum value may be appropriately set for the representative value.

＜第３実施形態＞
本実施形態では、第１実施形態と異なるコンピュータプログラム信頼性判定方法について説明する。具体的には、複数の識別子に対応する難読化率を用いて信頼性を判定する方法について説明する。 <Third Embodiment>
In this embodiment, a computer program reliability determination method different from that in the first embodiment will be described. Specifically, a method of determining reliability using obfuscation rates corresponding to a plurality of identifiers will be described.

図１４は、信頼性を判定する処理フローＳ１４０Ｂの一例である。図１４に示すように、複数の識別子の各々に対応する難読化率とあらかじめ設定された第１閾値との比較を行う（Ｓ１４１）。次に、すべての難読化率のうち第１閾値を超える難読化率の割合を算出する（Ｓ１４３）。次に、上述の割合に基づいて、コンピュータプログラムの信頼性を算出する（Ｓ１４６）。このとき、上記割合が第２閾値を超える場合（Ｓ１４６；Ｙｅｓ）、コンピュータプログラムの信頼性が高いとする信頼性情報１が生成される（Ｓ１４７）。一方、上記割合が第２閾値を超えない（Ｓ１４６；Ｎｏ）、コンピュータプログラムの信頼性が低いとする信頼性情報２が生成される（Ｓ１４９）。 FIG. 14 is an example of a processing flow S140B for determining reliability. As shown in FIG. 14, the obfuscation rate corresponding to each of the plurality of identifiers is compared with a preset first threshold (S141). Next, out of all the obfuscation rates, the ratio of obfuscation rates exceeding the first threshold is calculated (S143). Next, the reliability of the computer program is calculated based on the above ratio (S146). At this time, if the ratio exceeds the second threshold (S146; Yes), reliability information 1 indicating that the computer program is highly reliable is generated (S147). On the other hand, if the ratio does not exceed the second threshold (S146; No), reliability information 2 is generated indicating that the reliability of the computer program is low (S149).

（変形例）
本発明の思想の範疇において、当業者であれば、各種の変更例および修正例に想到し得るものであり、それら変更例および修正例についても本発明の範囲に属するものと了解される。例えば、前述の各実施形態に対して、当業者が適宜、構成要素の追加、削除若しくは設計変更を行ったもの、又は、ステップの追加、省略若しくは条件変更を行ったもの、各実施形態の構成組み合わせたものも、本発明の要旨を備えている限り、本発明の範囲に含まれる。 (Modification)
Within the scope of the idea of the present invention, those skilled in the art can come up with various modifications and modifications, and it is understood that these modifications and modifications also belong to the scope of the present invention. For example, those skilled in the art appropriately added, deleted, or changed the design of components, or added, omitted, or changed the conditions of the above-described embodiments, and the configuration of each embodiment. Combinations are also included in the scope of the present invention as long as they have the gist of the present invention.

本発明の第１実施形態では、一つの識別子から、一つの難読化率を生成する例を示したが、本発明はこれに限定されない。複数の識別子から一つの難読化率を生成してもよい。複数の識別子の集合と対応する一つの難読化率を計算する場合、例えば数式３を用いてもよい。

ここで、「ｈ」は代表値を返す任意の関数である。典型的には、「ｈ」として平均値や中央値等を返す関数を使用する。また、Ｄ_１，Ｄ_２，・・・，Ｄ_Ｎ－１，Ｄ_Ｎは、Ｎ個の識別子の各々と対応する区分不一致率である。 Although the first embodiment of the present invention shows an example of generating one obfuscation rate from one identifier, the present invention is not limited to this. A single obfuscation rate may be generated from multiple identifiers. When calculating one obfuscation rate corresponding to a set of identifiers, for example, Equation 3 may be used.

where 'h' is an arbitrary function that returns a representative value. Typically, we use a function that returns the mean, median, etc. as 'h'. D ₁ , D ₂ , . . . , D _N−1 , D _N are the discriminant mismatch rates corresponding to each of the N identifiers.

また、本発明の第１実施形態では、難読化率が、閾値情報を超えるか否かでコンピュータプログラムの信頼性が高いまたは低いかを判定する例を示したが、本発明はこれに限定されない。さらに、難読化率と第１閾値情報との差分に応じて信頼性を判定してもよい。この場合、難読化率が第１閾値情報との差分が大きい場合、コンピュータプログラムの信頼性がより高いと判定してもよい。これにより、さらに正確にコンピュータプログラムの信頼性を判定することができる。 In addition, in the first embodiment of the present invention, an example of determining whether the reliability of the computer program is high or low is determined by whether the obfuscation rate exceeds the threshold information, but the present invention is not limited to this. . Further, reliability may be determined according to the difference between the obfuscation rate and the first threshold information. In this case, if the difference between the obfuscation rate and the first threshold information is large, it may be determined that the reliability of the computer program is higher. This makes it possible to determine the reliability of the computer program more accurately.

また、本発明の第１実施形態では、コンピュータプログラム信頼性判定システム１００が、一つの端末で実施されている例を示したが、本発明はこれに限定されない。例えば、コンピュータプログラム信頼性判定システムは、複数の端末、クライアントまたはサーバを用いてネットワーク接続されて構成されてもよい。 Also, in the first embodiment of the present invention, an example in which the computer program reliability determination system 100 is implemented in one terminal has been described, but the present invention is not limited to this. For example, the computer program trustworthiness determination system may be networked using a plurality of terminals, clients or servers.

また、本発明の第２、３実施形態において、複数の識別子を用いる例を示したが、取得された識別子の数に基づいて信頼性を判定してもよい。例えば、コンピュータプログラムから取得された識別子の数が所定の閾値以下である場合、プログラムの信頼性について、数値または文字を用いた特定の程度で判定してもよい。 Also, in the second and third embodiments of the present invention, an example using a plurality of identifiers has been shown, but reliability may be determined based on the number of acquired identifiers. For example, if the number of identifiers obtained from a computer program is less than or equal to a predetermined threshold, the program's credibility may be determined to a certain numerical or character degree.

１００・・・コンピュータプログラム信頼性判定システム，１１０・・・制御部，１２０・・・記憶部，１３０・・・ユーザインタフェース部，１３１・・・表示部，１３３・・・操作部，９００・・・データ構造，９０１・・・分類番号，９０３・・・集合分類，９１０・・・データ構造，９１１・・・番号，９１３・・・文字の集合，９１５・・・具体例，９２０・・・データ構造，９２１・・・番号，９２３・・・規則名，９２５・・・識別子の構成，９２７・・・具体例，１１０１・・・取得部，１１０３・・・識別子抽出部，１１０５・・・難読化率生成部，１１０７・・・信頼性判定部，１２０１・・・コンピュータプログラム，１２０２・・・識別子，１２０２ａ・・・部分文字列，１２０２ａｇ・・・集合，１２０３・・・文字区分，１２０３－１・・・第１文字区分，１２０３－２・・・第２文字区分，１２０３－３・・・第３文字区分，１２０５・・・命名規則，１２０７・・・閾値
DESCRIPTION OF SYMBOLS 100... Computer program reliability determination system, 110... Control part, 120... Storage part, 130... User interface part, 131... Display part, 133... Operation part, 900... Data structure 901 Classification number 903 Set classification 910 Data structure 911 Number 913 Set of characters 915 Concrete example 920 Data structure 921 Number 923 Rule name 925 Structure of identifier 927 Concrete example 1101 Acquisition unit 1103 Identifier extraction unit 1105 Obfuscation rate generation unit 1107 Reliability determination unit 1201 Computer program 1202 Identifier 1202a Substring 1202ag Set 1203 Character division 1203 -1 First character division 1203-2 Second character division 1203-3 Third character division 1205 Naming rule 1207 Threshold

Claims

an acquisition unit for acquiring a computer program;
an extraction unit for extracting at least one identifier from the obtained computer program;
an obfuscation rate generator that generates at least one obfuscation rate based on the extracted at least one identifier;
a reliability determination unit that determines the reliability of the computer program based on the generated at least one obfuscation rate ;
The obfuscation rate generator,
dividing the identifier into substrings containing a plurality of characters, calculating first mismatch rate information based on a set of the divided substrings and a preset character division;
setting one rule out of a plurality of pre-stored rules, calculating second mismatch rate information based on the set rule;
generating the obfuscation rate using the first mismatch rate information and the second mismatch rate information;
A computer program reliability determination system.

The obfuscation rate generation unit segments the divided partial character string based on the character segmentation, compares the character segmentation of adjacent characters in the segmented partial character string, and generates the first mismatch rate information. to calculate
The computer program reliability determination system according to claim 1 .

The obfuscation rate generation unit sets the rule according to a set of characters constituting the identifier and the appearance frequency of each character in the set of characters,
The obfuscation rate is high when the identifier is likely to be generated by a rule that satisfies a predetermined condition.
The computer program reliability determination system according to claim 1 .

the at least one identifier comprises a plurality of identifiers;
The reliability determination unit calculates a representative value of an obfuscation rate corresponding to each of the plurality of identifiers, and the computer program based on the relationship between the calculated representative value and a preset first threshold determining the reliability of
4. The computer program reliability determination system according to any one of claims 1 to 3 .

the at least one identifier comprises a plurality of identifiers;
The reliability determination unit determines the ratio of identifiers having an obfuscation rate exceeding a preset first threshold among the plurality of identifiers and the relationship between a preset second threshold and the computer program. determine reliability,
4. The computer program reliability determination system according to any one of claims 1 to 3 .

The reliability determination unit determines the reliability of the computer program based on a difference between the obfuscation rate and a preset first threshold.
4. The computer program reliability determination system according to any one of claims 1 to 3 .

The reliability determination unit determines the reliability of the computer program based on the number of identifiers.
The computer program reliability determination system according to any one of claims 1 to 6 .

The control part of the computer
obtaining a computer program entered by a user or stored in a storage unit ;
extracting at least one identifier from the obtained computer program;
generating at least one obfuscation rate based on the extracted at least one identifier;
determining the trustworthiness of the computer program based on the generated obfuscation rate ;
The control unit divides the identifier into partial character strings containing a plurality of characters, calculates first mismatch rate information based on a set of the divided partial character strings and a preset character classification,
setting one rule out of a plurality of rules stored in advance in the storage unit, calculating second mismatch rate information based on the set rule;
generating the obfuscation rate using the first mismatch rate information and the second mismatch rate information;
Computer program reliability determination method.

segmenting the divided partial character string based on the character segmentation, and comparing character segmentation of adjacent characters in the segmented partial character string to calculate the first mismatch rate information;
9. The computer program reliability determination method according to claim 8 .

setting the rule according to a set of characters that make up the identifier and the appearance frequency of each character in the set of characters;
The obfuscation rate is high when the identifier is likely to be generated by a rule that satisfies a predetermined condition.
9. The computer program reliability determination method according to claim 8 .

the at least one identifier comprises a plurality of identifiers;
calculating a representative value of the obfuscation rate corresponding to each of the plurality of identifiers, and determining the reliability of the computer program based on the relationship between the calculated representative value and a preset first threshold ,
11. The computer program reliability determination method according to any one of claims 8 to 10 .

the at least one identifier comprises a plurality of identifiers;
Determining the reliability of the computer program based on whether a ratio of identifiers having an obfuscation rate exceeding a preset first threshold among the plurality of identifiers exceeds a preset second threshold;
11. The computer program reliability determination method according to any one of claims 8 to 10 .

Determining the reliability of the computer program based on the difference between the obfuscation rate and a preset first threshold;
11. The computer program reliability determination method according to any one of claims 8 to 10 .

determining the trustworthiness of the computer program based on the number of identifiers;
14. The computer program reliability determination method according to any one of claims 8 to 13 .

to the computer,
get a computer program
extracting at least one identifier from the obtained computer program;
generating at least one obfuscation rate based on the extracted at least one identifier;
causing determining the trustworthiness of the computer program based on the generated obfuscation rate ;
dividing the identifier into substrings containing a plurality of characters, calculating first mismatch rate information based on a set of the divided substrings and a preset character division;
setting one rule out of a plurality of pre-stored rules, calculating second mismatch rate information based on the set rule;
generating the obfuscation rate using the first mismatch rate information and the second mismatch rate information;
Computer program reliability determination program.