CN114090650A - Sample data identification method and device, electronic equipment and storage medium - Google Patents

Sample data identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114090650A
CN114090650A CN202111325163.3A CN202111325163A CN114090650A CN 114090650 A CN114090650 A CN 114090650A CN 202111325163 A CN202111325163 A CN 202111325163A CN 114090650 A CN114090650 A CN 114090650A
Authority
CN
China
Prior art keywords
sample data
identified
data
sample
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111325163.3A
Other languages
Chinese (zh)
Inventor
陈扬
尚程
王方圆
田野
梁彧
傅强
王杰
杨满智
蔡琳
金红
陈晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eversec Beijing Technology Co Ltd
Original Assignee
Eversec Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eversec Beijing Technology Co Ltd filed Critical Eversec Beijing Technology Co Ltd
Priority to CN202111325163.3A priority Critical patent/CN114090650A/en
Publication of CN114090650A publication Critical patent/CN114090650A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a sample data identification method and device, electronic equipment and a storage medium. The sample data identification method may specifically include: carrying out primary screening on original sample data to obtain sample data to be identified; performing data interaction with the sample data to be identified according to the data content of the sample data to be identified to obtain a data interaction result of the sample data to be identified; carrying out multi-dimensional identification on the data interaction result of the sample data to be identified to obtain multi-dimensional characteristic data to be identified of the sample data to be identified; and determining the identification result of the sample data to be identified according to the multi-dimensional characteristic data to be identified. The technical scheme of the embodiment of the invention can accurately identify the sample data, thereby improving the accuracy and reliability of the identification of the sample data.

Description

Sample data identification method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of internet, in particular to a sample data identification method, a sample data identification device, electronic equipment and a storage medium.
Background
In recent years, with the rapid development of technologies such as communication technology, internet, cloud computing, big data and the like, identification of sample data such as images, communication and the like becomes a hot spot of research at home and abroad. In the prior art, when identifying sample data, the data to be identified is usually compared with the verification data in the identification database to obtain a comparison result, so as to identify the data to be identified. However, the existing sample data identification method is low in accuracy and reliability.
Disclosure of Invention
The embodiment of the invention provides a sample data identification method, a sample data identification device, electronic equipment and a storage medium, which can accurately identify sample data, so that the accuracy and reliability of sample data identification are improved.
In a first aspect, an embodiment of the present invention provides a sample data identification method, including:
carrying out primary screening on original sample data to obtain sample data to be identified;
performing data interaction according to the data content of the sample data to be identified and the sample data to be identified to obtain a data interaction result of the sample data to be identified;
carrying out multi-dimensional identification on the data interaction result of the sample data to be identified to obtain multi-dimensional characteristic data to be identified of the sample data to be identified;
and determining the identification result of the sample data to be identified according to the multi-dimensional characteristic data to be identified.
In a second aspect, an embodiment of the present invention further provides a sample data identification apparatus, including:
the to-be-identified sample data acquisition module is used for primarily screening original sample data to obtain the to-be-identified sample data;
the data interaction result acquisition module is used for carrying out data interaction with the sample data to be identified according to the data content of the sample data to be identified to obtain a data interaction result of the sample data to be identified;
the multi-dimensional characteristic data to be identified acquisition module is used for carrying out multi-dimensional identification on the data interaction result of the sample data to be identified to obtain the multi-dimensional characteristic data to be identified of the sample data to be identified;
and the identification result determining module of the sample data to be identified is used for determining the identification result of the sample data to be identified according to the multi-dimensional characteristic data to be identified.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the sample data identification method provided by any embodiment of the invention.
In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the sample data identification method provided in any embodiment of the present invention.
According to the embodiment of the invention, the original sample data is preliminarily screened to obtain the sample data to be identified, data interaction is carried out on the sample data to be identified according to the data content of the sample data to be identified to obtain the data interaction result of the sample data to be identified, and then the data interaction result of the sample data to be identified is subjected to multi-dimensional identification to obtain the multi-dimensional characteristic data to be identified of the sample data to be identified, so that the identification result of the sample data to be identified is determined according to the multi-dimensional characteristic data to be identified, the problems of low data identification accuracy, poor reliability and the like of the existing sample data identification method are solved, the sample data can be accurately identified, and the accuracy and the reliability of sample data identification are improved.
Drawings
Fig. 1 is a flowchart of a sample data identification method according to an embodiment of the present invention;
fig. 2 is a flowchart of a sample data identification method according to a second embodiment of the present invention;
fig. 3 is an exemplary flowchart of a sample data identification method according to a second embodiment of the present invention;
fig. 4 is a schematic diagram of a sample data identification apparatus according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.
It is to be further noted that, for the convenience of description, only some but not all of the relevant portions of the invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
The terms "first" and "second," and the like in the description and claims of embodiments of the invention and in the drawings, are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not set forth for a listed step or element but may include steps or elements not listed.
Example one
Fig. 1 is a flowchart of a sample data identification method according to an embodiment of the present invention, where the present embodiment is applicable to a case where accuracy of sample data identification is improved, the method may be executed by a sample data identification device, and the device may be implemented in a software and/or hardware manner, and may generally be directly integrated in an electronic device executing the method. Specifically, as shown in fig. 1, the sample data identification method may specifically include the following steps:
and S110, carrying out primary screening on the original sample data to obtain the sample data to be identified.
The original sample data may be sample data that is not processed and can be obtained directly, for example, the original sample data may be original communication sample data, or may also be original image sample data, and the like. It can be understood that the acquisition of the original sample data may be sample data detected by an operator, or may also be sample data recorded by a relevant department, or may also be sample data provided by a third party, and the like. The sample data to be recognized may be sample data that needs to be recognized, for example, the sample data may be communication sample data that needs to be recognized, or may also be image sample data that needs to be recognized, and the embodiment of the present invention does not limit this.
In the embodiment of the invention, the original sample data is primarily screened to obtain the sample data to be identified, so that the data identification can be carried out on the sample data to be identified. For example, the original sample data is primarily screened to obtain sample data to be identified, wherein the sample data meeting a specific rule is screened from the original sample data, and the sample data meeting the specific rule is determined as the sample data to be identified; or screening unidentifiable sample data from the original sample data, and determining the unidentifiable sample data as the sample data to be identified; or, a preset number of sample data may be screened from the original sample data, and the preset number of sample data is determined as the sample data to be identified. The embodiment of the invention does not limit the screening mode of the original sample data.
And S120, performing data interaction with the sample data to be identified according to the data content of the sample data to be identified to obtain a data interaction result of the sample data to be identified.
The data content may be any data in the sample data to be recognized, for example, the data content may be a communication identifier in the sample data of the communication to be recognized, or may also be a color identifier in the image sample data, and the embodiment of the present invention does not limit this. The data interaction result may be a result obtained by performing data interaction with the sample data to be identified according to the data content of the sample data to be identified. For example, if the data content of the communication sample data to be recognized is a communication identifier, the data interaction result may be a text or an image obtained by interacting the communication identifier with the data of the communication sample data to be recognized. If the data content of the image sample data to be recognized is the color identifier, the data interaction result may be a pixel point obtained by data interaction between the color identifier and the image sample data to be recognized.
In the embodiment of the invention, after the original sample data is preliminarily screened to obtain the sample data to be identified, data interaction can be further carried out on the sample data to be identified and the sample data to be identified according to the data content of the sample data to be identified to obtain the data interaction result of the sample data to be identified, so that the data interaction result of the sample data to be identified is subjected to multi-dimensional identification.
S130, carrying out multi-dimensional identification on the data interaction result of the sample data to be identified to obtain multi-dimensional characteristic data to be identified of the sample data to be identified.
The multi-dimensional identification can be to identify the feature to be identified through different dimensions. The feature to be recognized may be a feature to be recognized in the data interaction result of the sample data to be recognized, for example, the feature may be a text feature or an image feature of the communication sample data to be recognized, or a pixel point feature of the image sample data to be recognized, and the like. It is understood that there may be a plurality of features to be identified in the data interaction result of the sample data to be identified. The multi-dimensional feature data to be recognized may be feature data obtained by recognizing features to be recognized through different dimensions.
In the embodiment of the invention, after the data interaction result of the sample data to be identified is obtained by performing data interaction with the sample data to be identified according to the data content of the sample data to be identified, the data interaction result of the sample data to be identified can be further subjected to multi-dimensional identification so as to obtain the multi-dimensional characteristic data to be identified of the sample data to be identified. It can be understood that after performing multidimensional identification on the data interaction result of the sample data to be identified, each dimension identification obtains one identification result, and the multidimensional characteristic data to be identified may be an arithmetic average of each identification result, or a weighted average of each identification result, and the like, which is not limited in the embodiment of the present invention.
S140, determining the identification result of the sample data to be identified according to the multi-dimensional characteristic data to be identified.
In the embodiment of the invention, the multi-dimensional identification is carried out on the data interaction result of the sample data to be identified to obtain the multi-dimensional characteristic data to be identified of the sample data to be identified, and the identification result of the sample data to be identified can be further determined according to the multi-dimensional characteristic data to be identified. For example, when the sample data to be identified is the communication sample data to be identified, the identification result may be whether the communication sample data to be identified is fraud sample data, or whether the communication sample data to be identified is virus sample data, and the like, which is not limited in the embodiment of the present invention. When the sample data to be recognized is the sample data of the image to be recognized, the recognition result may be whether the sample data of the image to be recognized is the highlight sample data, or whether the sample data of the image to be recognized is the sample data of the human face, and the like.
According to the technical scheme, the original sample data is preliminarily screened to obtain sample data to be identified, data interaction is carried out on the sample data to be identified according to the data content of the sample data to be identified to obtain the data interaction result of the sample data to be identified, multi-dimensional identification is carried out on the data interaction result of the sample data to be identified to obtain the multi-dimensional characteristic data to be identified of the sample data to be identified, so that the identification result of the sample data to be identified is determined according to the multi-dimensional characteristic data to be identified, the problems of low data identification accuracy, poor reliability and the like of the existing sample data identification method are solved, the sample data can be accurately identified, and the accuracy and the reliability of sample data identification are improved.
Example two
Fig. 2 is a flowchart of a sample data identification method provided in the second embodiment of the present invention, and this embodiment further details the above technical solutions, and provides a variety of specific optional implementation manners for performing preliminary screening on original sample data to obtain sample data to be identified, and performing data interaction with the sample data to be identified according to data content of the sample data to be identified to obtain a data interaction result of the sample data to be identified. The solution in this embodiment may be combined with the individual alternatives in one or more of the embodiments described above. As shown in fig. 2, the method may include the steps of:
s210, performing data preprocessing on the original sample data; wherein the data preprocessing comprises a deduplication processing and/or an exception cleaning processing.
The data preprocessing may be preprocessing the original sample data. The deduplication process may be to remove duplicate sample data in the original sample data. The exception cleaning processing may be cleaning of exception data in the original sample data. It can be understood that there may be a plurality of identical sample data in different forms in the original sample data, thereby causing abnormal data in the original sample data.
In the embodiment of the present invention, the data preprocessing on the original sample data may be to perform deduplication processing on the original sample data, or to perform exception cleaning processing on the original sample data, or to perform deduplication processing and exception cleaning processing on the original sample data. For example, data preprocessing and the like may be performed on original sample data through big data analysis, which is not limited in the embodiment of the present invention as long as the data preprocessing of the original sample data can be implemented.
According to the technical scheme, the original sample data is subjected to data preprocessing, so that each sample data can be used as the unique, effective and analyzable sample data, the effectiveness of the sample data is improved, and the identification efficiency of the sample data is improved.
And S220, acquiring the benchmark reference characteristics of the benchmark reference data according to the sample characteristic database.
Wherein, the sample characteristic database can be a database containing a plurality of sample characteristics. For example, in a specific fraud-related application scenario, the sample feature database may be a fraud-related blacklist database containing a plurality of fraud-related sample features, or a fraud-related whitelist database containing a plurality of non-fraud-related sample features, and the like, which is not limited by the embodiment of the present invention. The base reference data may be data that can be referred to as a base. The reference feature may be a feature in the reference data that can be referenced as a reference.
In the embodiment of the present invention, after the data preprocessing is performed on the original sample data, the benchmark reference feature of the benchmark reference data may be further obtained according to the sample feature database. Specifically, the reference feature in the reference data may be determined according to the sample feature in the sample feature database. For example, the reference feature of the reference data may be the same feature as the sample feature in the sample feature database, or may be a feature similar to the sample feature in the sample feature database, and the like, which is not limited in this embodiment of the present invention.
And S230, performing feature analysis on the original sample data and the benchmark reference feature according to a multi-dimensional sample identification algorithm.
The multidimensional sample identification algorithm may be an algorithm capable of identifying a sample from multiple dimensions, for example, the multidimensional sample identification algorithm may be a fusion of multiple algorithms such as a random forest algorithm, a similarity algorithm, a collision algorithm, an AI learning algorithm, and the like.
In the embodiment of the invention, after the benchmark reference characteristics of the benchmark reference data are obtained according to the sample characteristic database, the original sample data and the benchmark reference characteristics can be further subjected to characteristic analysis according to a multi-dimensional sample identification algorithm. For example, the similarity analysis may be performed on the original sample data and the reference feature according to a multi-dimensional sample recognition algorithm, or the collision analysis may be performed on the original sample data and the reference feature according to the multi-dimensional sample recognition algorithm, which is not limited in the embodiment of the present invention.
S240, dividing the original sample data into the sample data to be recognized and the recognized sample data according to the feature analysis result.
The feature analysis result may be a result obtained after performing the feature analysis. For example, the feature analysis result may be that the original sample data is the same as the benchmark reference feature, or the original sample data is similar to the benchmark reference feature, or the original sample data is different from the benchmark reference feature, or may be a specific numerical value of the similarity between the original sample data and the benchmark reference feature, and the like, which is not limited in the embodiment of the present invention. The identified sample data may be sample data that has been identified successfully.
In the embodiment of the invention, after the original sample data and the benchmark reference feature are subjected to feature analysis according to the multi-dimensional sample identification algorithm, the original sample data can be further divided into the sample data to be identified and the identified sample data according to the feature analysis result. For example, if the feature analysis result is that the original sample data and the benchmark reference feature are the same, the original sample data may be determined as the identified sample data. If the result of the feature analysis is that the original sample data is similar to the reference feature, the original sample data may be further divided into sample data to be recognized and recognized sample data according to a specific numerical value of the similarity, for example, when the similarity is higher than a certain threshold, the original sample data is determined as recognized sample data, and correspondingly, when the similarity is lower than a certain threshold, the original sample data is determined as sample data to be recognized. And if the characteristic analysis result shows that the original sample data and the benchmark reference characteristic are different, determining the original sample data as the sample data to be identified.
According to the technical scheme, the original sample data is divided into the sample data to be identified, so that the sample data to be identified is subjected to further data identification, and the accuracy of the sample data identification is improved.
Optionally, the sample feature database may include a first sample feature database and a second sample feature database; the fiducial reference features may include a first fiducial reference feature and a second fiducial reference feature; the identified sample data may comprise first identified sample data and second identified sample data.
Wherein the first sample characteristic database may be a database containing a plurality of sample characteristics. The second database of sample features may be another database containing a plurality of sample features. The first baseline reference feature may be a reference feature in a first database of sample features. The second baseline reference feature may be a reference feature in a second database of sample features. The first identified sample data may be one identified sample data identified from the first fiducial reference feature. The second recognized sample data may be one recognized sample data recognized based on the second fiducial reference feature. It will be appreciated that the sample features in the first sample feature database are different from the sample features in the second sample feature database.
For example, in a specific fraud-related application scenario, the first sample feature database may be a fraud-related blacklist database, the second sample feature database may be a fraud-related whitelist database, the first benchmark reference feature may be a reference feature in the fraud-related blacklist database, the second benchmark reference feature may be a reference feature in the fraud-related whitelist database, the first identified sample data may be identified fraud-related sample data, and the second identified sample data may be identified non-fraud-related sample data.
Specifically, first benchmark reference features of the benchmark reference data are obtained according to the first sample feature database, feature analysis is conducted on the original sample data and the first benchmark reference features according to a multi-dimensional sample identification algorithm, and the original sample data are divided into sample data to be identified and first identified sample data according to feature analysis results. Correspondingly, second benchmark reference characteristics of the benchmark reference data are obtained according to the second sample characteristic database, characteristic analysis is carried out on the original sample data and the second benchmark reference characteristics according to a multi-dimensional sample identification algorithm, and the original sample data are divided into sample data to be identified and second identified sample data according to a characteristic analysis result.
And S250, performing data interaction with the sample data to be identified according to the data content of the sample data to be identified to obtain a data interaction result of the sample data to be identified.
Optionally, the original sample data may include original communication sample data; the sample data to be identified can comprise the communication sample data to be identified; correspondingly, performing data interaction with the sample data to be identified according to the data content of the sample data to be identified to obtain a data interaction result of the sample data to be identified, which may include: obtaining a communication identifier included in communication sample data to be identified; and simulating the access communication identifier to obtain a communication result of the communication sample data to be identified as a data interaction result of the communication sample data to be identified.
The original communication sample data may be unprocessed sample data that can be directly obtained for communication. For example, the original communication sample data may include original website sample data, original APP (Application) sample data, or original phone number sample data, or the like. The communication sample data to be identified may be sample data for communication that needs to be identified. For example, the communication sample data to be identified may include sample data of a website to be identified, sample data of an APP to be identified, or sample data of a phone number to be identified. The communication identifier may be used to identify communication sample data to be recognized. Illustratively, the communication identifier may include a website link, an APP link, or a telephone number, among others.
Specifically, when the original sample data includes original communication sample data and the sample data to be identified includes communication sample data to be identified, the communication identifier included in the communication sample data to be identified may be acquired, and simulation access may be performed on the communication identifier to acquire a communication result of the communication sample data to be identified, so that the communication result of the communication sample data to be identified is used as the data interaction result of the communication sample data to be identified. For example, when the original communication sample data includes original website sample data, a website link included in the website sample data to be identified may be acquired, and simulated access may be performed on the website link. Correspondingly, when the original communication sample data comprises the original APP sample data, the APP link comprising the APP sample data to be identified can be obtained, and the simulation access is carried out on the APP link. Correspondingly, when the original communication sample data comprises the original telephone number sample data, the telephone number included in the telephone number sample data to be identified can be acquired, and the telephone number is subjected to simulation access.
S260, carrying out multi-dimensional identification on the data interaction result of the sample data to be identified to obtain multi-dimensional characteristic data to be identified of the sample data to be identified.
S270, determining the identification result of the sample data to be identified according to the multi-dimensional characteristic data to be identified.
Optionally, after determining the identification result of the sample data to be identified according to the multi-dimensional feature data to be identified, the method may further include: under the condition that the identification result is determined to be the target identification result, acquiring the characteristics of a newly added sample of the sample data to be identified; storing the newly added sample features in a first sample feature database; and the first sample characteristic database is used for preliminarily screening the updated original sample data.
Wherein the target recognition result may be a target value in the recognition result. For example, in a specific fraud-related application scenario, the target identification result may be that the sample data to be identified is fraud-related sample data, or that the sample data to be identified is non-fraud-related sample data, and the embodiment of the present invention does not limit this. The newly added sample feature may be a feature of the sample data to be identified determined as a newly added sample feature when the identification result of the sample data to be identified is the target identification result. The updated original sample data may be new original sample data acquired again after the original sample data is recognized.
Specifically, after the identification result of the sample data to be identified is determined according to the multi-dimensional characteristic data to be identified, further under the condition that the identification result is determined to be the target identification result, the newly added sample characteristics of the sample data to be identified can be obtained, and the newly added sample characteristics are stored in the first sample characteristic database, so that the updated original sample data is primarily screened according to the updated first sample characteristic database. For example, in a specific fraud-related application scenario, when the identification result of the sample data to be identified is determined to be fraud-related sample data according to the multidimensional feature data to be identified, newly-added sample features of the sample data to be identified may be obtained, and the newly-added sample features are stored in the fraud-related blacklist database, so as to expand the fraud-related sample data in the fraud-related blacklist database.
According to the technical scheme, the newly-added sample characteristics are stored in the first sample characteristic database, so that the sample characteristics in the first sample characteristic database can be expanded, and the accuracy of data identification is improved.
For example, in a specific fraud-related application scenario, the suspected fraud-related sample data is taken as an example for explanation. Fig. 3 is an exemplary flowchart of a sample data identification method provided in the second embodiment of the present invention, and as shown in fig. 3, the method may specifically include the following steps:
a. obtaining suspected fraud sample data (namely original sample data); the suspected fraud-related sample data may be a website accessible and usable by a mobile internet, a fixed internet, an APP, or the like.
b. The suspected fraud-related sample data is preprocessed through the preprocessing engine, repeated data screening and dirty data removing work is completed, and each suspected fraud-related sample data can be used as a unique, effective and analyzable target.
c. The suspected fraud-related sample data is subjected to fraud-related similarity analysis and collision analysis through the intelligent analysis engine, and the suspected fraud-related sample data is divided into a white list (namely a normal sample), a black list (namely a fraud-related sample) and a gray list (namely an unknown sample) so as to perform labeling screening on the suspected fraud-related sample data.
Specifically, basic sample characteristics are respectively output from an existing black and white list library, dimension analysis such as sample signature, text, image, MD5(Message-Digest Algorithm) is performed on suspected fraud-related sample data, and whether the suspected fraud-related sample data belongs to a fraud-related sample is determined based on an intelligent analysis Algorithm. Defining suspected fraud sample data matched with the white list algorithm as normal samples, labeling the suspected fraud sample data with a white list label, and storing the suspected fraud sample data into a white list library; defining suspected fraud sample data matched with the blacklist algorithm as fraud samples, labeling the suspected fraud sample data with labels of a blacklist, and storing the suspected fraud sample data into a blacklist library; and defining suspected fraud sample data which is not matched with the white list and the black list algorithm as unknown samples, labeling the suspected fraud sample data with a label of a grey list, and storing the suspected fraud sample data into a grey list library.
d. And performing dynamic breeding research and judgment on suspected fraud sample data in the grey name list library through a dynamic breeding engine, and performing algorithm analysis by using characteristics such as sample signatures, texts, images, MD5 and the like.
Specifically, suspected fraud sample data in the grey name list library is determined according to the weight value of the algorithm result. Defining suspected fraud sample data matched with the white list algorithm as normal samples, labeling the suspected fraud sample data with a white list label, and storing the suspected fraud sample data into a white list library; and for suspected fraud-related sample data matched with the blacklist algorithm, defining the suspected fraud-related sample data as a fraud-related sample, labeling the suspected fraud-related sample data with a label of a blacklist, and storing the suspected fraud-related sample data into a blacklist library.
f. The suspected fraud-related sample data in the blacklist library is analyzed through the dynamic breeding engine, and fraud-related features of the suspected fraud-related sample data in the blacklist library are expanded (such as the same family features or the same source features are expanded).
According to the technical scheme, suspected fraud-related sample data in the operator network can be accurately identified and intelligently researched and judged, whether the sample is a fraud-related sample or not is automatically detected according to the basic characteristics of the suspected fraud-related sample data, and the result can be applied to telecommunication network fraud prevention and control management business; the method can automatically analyze the suspected fraud-related sample data, support a characteristic analysis model and a dynamic culture model, and further analyze, study and judge and train the models; adopting data management and mining algorithms such as similarity analysis, collision analysis, AI (Artificial Intelligence) model training, random forest and the like; automatic study and judgment and learning are realized through a model training engine and a dynamic culture engine, and the accurate identification capability of suspected fraud-related sample data is continuously improved; effectively preventing the occurrence of Internet fraud.
The technical scheme of the embodiment includes that original sample data is subjected to data preprocessing, the benchmark reference characteristics of the benchmark reference data are obtained according to a sample characteristic database, the original sample data and the benchmark reference characteristics are subjected to characteristic analysis according to a multi-dimensional sample identification algorithm, the original sample data is divided into sample data to be identified and identified sample data according to a characteristic analysis result, data interaction is performed on the sample data to be identified and the sample data to be identified according to the data content of the sample data to be identified, a data interaction result of the sample data to be identified is obtained, multi-dimensional identification is further performed on the data interaction result of the sample data to be identified, the multi-dimensional characteristic data to be identified of the sample data to be identified is obtained, the identification result of the sample data to be identified is determined according to the multi-dimensional characteristic data to be identified, and the problems that an existing sample data identification method is low in data identification accuracy, poor in reliability and the like are solved, the method can accurately identify the sample data, thereby improving the accuracy and reliability of the identification of the sample data.
EXAMPLE III
Fig. 4 is a schematic diagram of a sample data identification apparatus according to a third embodiment of the present invention, as shown in fig. 4, the apparatus includes: a to-be-identified sample data obtaining module 410, a data interaction result obtaining module 420, a multi-dimensional to-be-identified feature data obtaining module 430, and an identification result determining module 440, wherein:
a to-be-identified sample data obtaining module 410, configured to perform preliminary screening on original sample data to obtain to-be-identified sample data;
a data interaction result obtaining module 420, configured to perform data interaction with the sample data to be identified according to the data content of the sample data to be identified, so as to obtain a data interaction result of the sample data to be identified;
a multidimensional feature data to be identified obtaining module 430, configured to perform multidimensional identification on a data interaction result of the sample data to be identified, so as to obtain multidimensional feature data to be identified of the sample data to be identified;
the identification result determining module 440 is configured to determine an identification result of the sample data to be identified according to the multi-dimensional feature data to be identified.
According to the technical scheme, the original sample data is preliminarily screened to obtain sample data to be identified, data interaction is carried out on the sample data to be identified according to the data content of the sample data to be identified to obtain the data interaction result of the sample data to be identified, multi-dimensional identification is carried out on the data interaction result of the sample data to be identified to obtain the multi-dimensional characteristic data to be identified of the sample data to be identified, so that the identification result of the sample data to be identified is determined according to the multi-dimensional characteristic data to be identified, the problems of low data identification accuracy, poor reliability and the like of the existing sample data identification method are solved, the sample data can be accurately identified, and the accuracy and the reliability of sample data identification are improved.
Optionally, the to-be-identified sample data obtaining module 410 may be specifically configured to: acquiring a reference characteristic of the reference data according to the sample characteristic database; performing similarity analysis feature analysis on the original sample data and the reference features according to a multi-dimensional sample identification algorithm; and dividing the original sample data into sample data to be recognized and recognized sample data according to the similarity characteristic analysis result.
Optionally, the sample feature database may include a first sample feature database and a second sample feature database; the fiducial reference features may include a first fiducial reference feature and a second fiducial reference feature; the identified sample data may comprise first identified sample data and second identified sample data.
Optionally, the original sample data may include original communication sample data; the sample data to be identified can comprise the communication sample data to be identified; correspondingly, the data interaction result obtaining module 420 may be specifically configured to: acquiring a communication identifier included in communication sample data to be identified; and simulating the access communication identifier to obtain a communication result of the communication sample data to be recognized as a data interaction result of the communication sample data to be recognized.
Optionally, the original communication sample data may include original website sample data; the communication sample data to be identified can comprise the website sample data to be identified; the communication identifier may include a website link; or, the original communication sample data may include original APP sample data; the communication sample data to be identified can comprise APP sample data to be identified; the communication identity may comprise an APP link; or, the original communication sample data may include original phone number sample data; the communication sample data to be identified can comprise telephone number sample data to be identified; the communication identification may include a telephone number.
Optionally, the recognition result determining module 440 may be specifically configured to: under the condition that the identification result is determined to be the target identification result, acquiring newly-added sample characteristics of sample data to be identified; storing the newly added sample characteristics in a first sample characteristic database; and the first sample characteristic database is used for preliminarily screening the updated original sample data.
Optionally, the to-be-identified sample data obtaining module 410 may be specifically configured to: carrying out data preprocessing on original sample data; the data preprocessing may include deduplication processing and/or exception cleaning processing, among others.
The sample data identification device can execute the sample data identification method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. Technical details that are not described in detail in this embodiment may be referred to a sample data identification method provided in any embodiment of the present invention.
Since the sample data identification device described above is a device capable of executing the sample data identification method in the embodiment of the present invention, based on the sample data identification method described in the embodiment of the present invention, a person skilled in the art can understand the specific implementation manner of the sample data identification device in the embodiment and various variations thereof, and therefore, how to implement the sample data identification method in the embodiment of the present invention by the sample data identification device is not described in detail herein. As long as the device adopted by the sample data identification method in the embodiment of the present invention is implemented by those skilled in the art, the scope of the present invention is intended to be protected.
Example four
Fig. 5 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 5 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
As shown in FIG. 5, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors 16, a memory 28, and a bus 18 that connects the various system components (including the memory 28 and the processors 16).
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to a non-removable, non-volatile magnetic medium (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk-Read Only Memory (CD-ROM), a Digital Video disk (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an Input/Output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network such as the internet) via the Network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be appreciated that although not shown in FIG. 5, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, Redundant processing units, external disk drive Arrays, (Redundant Arrays of Independent Disks, RAID) systems, tape drives, and data backup storage systems, to name a few.
The processor 16 executes the program stored in the memory 28 to execute various functional applications and data processing, so as to implement the sample data identification method provided by the embodiment of the present invention, including: carrying out preliminary screening on original sample data to obtain sample data to be identified; performing data interaction with the sample data to be identified according to the data content of the sample data to be identified to obtain a data interaction result of the sample data to be identified; carrying out multi-dimensional identification on the data interaction result of the sample data to be identified to obtain multi-dimensional characteristic data to be identified of the sample data to be identified; and determining the identification result of the sample data to be identified according to the multi-dimensional characteristic data to be identified.
EXAMPLE five
An embodiment of the present invention further provides a computer storage medium storing a computer program, which when executed by a computer processor is configured to perform the sample data identification method according to any one of the above embodiments of the present invention, including: carrying out primary screening on original sample data to obtain sample data to be identified; performing data interaction with the sample data to be identified according to the data content of the sample data to be identified to obtain a data interaction result of the sample data to be identified; carrying out multi-dimensional identification on the data interaction result of the sample data to be identified to obtain multi-dimensional characteristic data to be identified of the sample data to be identified; and determining the identification result of the sample data to be identified according to the multi-dimensional characteristic data to be identified.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions without departing from the scope of the invention. Therefore, although the present invention has been described in more detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A sample data identification method, comprising:
carrying out primary screening on original sample data to obtain sample data to be identified;
performing data interaction with the sample data to be identified according to the data content of the sample data to be identified to obtain a data interaction result of the sample data to be identified;
carrying out multi-dimensional identification on the data interaction result of the sample data to be identified to obtain multi-dimensional characteristic data to be identified of the sample data to be identified;
and determining the identification result of the sample data to be identified according to the multi-dimensional characteristic data to be identified.
2. The method according to claim 1, wherein the preliminary screening of the original sample data to obtain sample data to be identified comprises:
acquiring a reference characteristic of the reference data according to the sample characteristic database;
performing feature analysis on the original sample data and the benchmark reference features according to a multi-dimensional sample identification algorithm;
and dividing the original sample data into the sample data to be recognized and the recognized sample data according to the characteristic analysis result.
3. The method of claim 2, wherein the sample feature database comprises a first sample feature database and a second sample feature database;
the fiducial reference features comprise a first fiducial reference feature and a second fiducial reference feature;
the identified sample data comprises first identified sample data and second identified sample data.
4. The method of claim 1, wherein the original sample data comprises original communication sample data; the sample data to be identified comprises the sample data of the communication to be identified;
the data interaction is performed with the sample data to be identified according to the data content of the sample data to be identified, so as to obtain a data interaction result of the sample data to be identified, and the data interaction result comprises the following steps:
acquiring a communication identifier included in the communication sample data to be recognized;
and simulating and accessing the communication identifier to obtain a communication result of the communication sample data to be recognized as a data interaction result of the communication sample data to be recognized.
5. The method of claim 4, wherein the original communication sample data comprises original website sample data; the communication sample data to be identified comprises the sample data of the website to be identified; the communication identification comprises a website link; or
The original communication sample data comprises original APP sample data; the communication sample data to be identified comprises APP sample data to be identified; the communication identification comprises an APP link; or
The original communication sample data comprises original telephone number sample data; the communication sample data to be identified comprises telephone number sample data to be identified; the communication identification includes a telephone number.
6. The method according to claim 1, further comprising, after determining the recognition result of the sample data to be recognized according to the multi-dimensional feature data to be recognized:
under the condition that the identification result is determined to be a target identification result, acquiring newly-added sample characteristics of the sample data to be identified;
storing the newly added sample features in a first sample feature database;
and the first sample characteristic database is used for preliminarily screening the updated original sample data.
7. The method according to claim 1, further comprising, before said preliminary screening of original sample data:
carrying out data preprocessing on the original sample data;
wherein the data preprocessing comprises a deduplication processing and/or an exception cleaning processing.
8. A sample data identification apparatus, comprising:
the to-be-identified sample data acquisition module is used for primarily screening original sample data to obtain the to-be-identified sample data;
the data interaction result acquisition module is used for carrying out data interaction with the sample data to be identified according to the data content of the sample data to be identified to obtain a data interaction result of the sample data to be identified;
the multi-dimensional characteristic data to be identified acquisition module is used for carrying out multi-dimensional identification on the data interaction result of the sample data to be identified to obtain the multi-dimensional characteristic data to be identified of the sample data to be identified;
and the identification result determining module is used for determining the identification result of the sample data to be identified according to the multi-dimensional characteristic data to be identified.
9. An electronic device, characterized in that the electronic device comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the sample data identification method of any of claims 1-7.
10. A computer storage medium having stored thereon a computer program, the program, when executed by a processor, implementing the sample data identification method of any of claims 1-7.
CN202111325163.3A 2021-11-10 2021-11-10 Sample data identification method and device, electronic equipment and storage medium Pending CN114090650A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111325163.3A CN114090650A (en) 2021-11-10 2021-11-10 Sample data identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111325163.3A CN114090650A (en) 2021-11-10 2021-11-10 Sample data identification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114090650A true CN114090650A (en) 2022-02-25

Family

ID=80299877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111325163.3A Pending CN114090650A (en) 2021-11-10 2021-11-10 Sample data identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114090650A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115169111A (en) * 2022-07-04 2022-10-11 中北大学 Random forest based energetic material mechanical property prediction method and storage device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090210368A1 (en) * 2008-02-14 2009-08-20 Ebay Inc. System and method for real time pattern identification
CN107273531A (en) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 Telephone number classifying identification method, device, equipment and storage medium
CN110647622A (en) * 2019-09-29 2020-01-03 北京金山安全软件有限公司 Interactive data validity identification method and device
CN110674414A (en) * 2019-09-20 2020-01-10 北京字节跳动网络技术有限公司 Target information identification method, device, equipment and storage medium
CN111881991A (en) * 2020-08-03 2020-11-03 联仁健康医疗大数据科技股份有限公司 Method and device for identifying fraud and electronic equipment
CN112565250A (en) * 2020-12-04 2021-03-26 中国移动通信集团内蒙古有限公司 Website identification method, device, equipment and storage medium
CN112702331A (en) * 2020-12-21 2021-04-23 赛尔网络有限公司 Malicious link identification method and device based on sensitive words, electronic equipment and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090210368A1 (en) * 2008-02-14 2009-08-20 Ebay Inc. System and method for real time pattern identification
CN107273531A (en) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 Telephone number classifying identification method, device, equipment and storage medium
CN110674414A (en) * 2019-09-20 2020-01-10 北京字节跳动网络技术有限公司 Target information identification method, device, equipment and storage medium
CN110647622A (en) * 2019-09-29 2020-01-03 北京金山安全软件有限公司 Interactive data validity identification method and device
CN111881991A (en) * 2020-08-03 2020-11-03 联仁健康医疗大数据科技股份有限公司 Method and device for identifying fraud and electronic equipment
CN112565250A (en) * 2020-12-04 2021-03-26 中国移动通信集团内蒙古有限公司 Website identification method, device, equipment and storage medium
CN112702331A (en) * 2020-12-21 2021-04-23 赛尔网络有限公司 Malicious link identification method and device based on sensitive words, electronic equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KATSUTOSHI YADA; NATSUKI SANO: "Customer Behavior Modelling Using Radio Frequency Identification Data and the Hidden Markov Model", 2012 ANNUAL SRII GLOBAL CONFERENCE, 31 December 2012 (2012-12-31) *
薛杉;朱虹;吴文欢;: "单样本的低分辨率单目标人脸识别算法", 仪器仪表学报, no. 03, 15 March 2019 (2019-03-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115169111A (en) * 2022-07-04 2022-10-11 中北大学 Random forest based energetic material mechanical property prediction method and storage device

Similar Documents

Publication Publication Date Title
CN107239666B (en) Method and system for desensitizing medical image data
US11188789B2 (en) Detecting poisoning attacks on neural networks by activation clustering
CN111107048B (en) Phishing website detection method and device and storage medium
CN110147722A (en) A kind of method for processing video frequency, video process apparatus and terminal device
CN110674360B (en) Tracing method and system for data
US8705800B2 (en) Profiling activity through video surveillance
CN111931809A (en) Data processing method and device, storage medium and electronic equipment
CN113139025B (en) Threat information evaluation method, device, equipment and storage medium
CN111338692A (en) Vulnerability classification method and device based on vulnerability codes and electronic equipment
CN113032834A (en) Database table processing method, device, equipment and storage medium
CN114244611A (en) Abnormal attack detection method, device, equipment and storage medium
CN113722719A (en) Information generation method and artificial intelligence system for security interception big data analysis
CN114090650A (en) Sample data identification method and device, electronic equipment and storage medium
CN113936232A (en) Screen fragmentation identification method, device, equipment and storage medium
CN113420295A (en) Malicious software detection method and device
CN112685255A (en) Interface monitoring method and device, electronic equipment and storage medium
CN117113403A (en) Data desensitization method, device, electronic equipment and storage medium
CN115022201B (en) Data processing function test method, device, equipment and storage medium
CN111045849A (en) Method, device, server and storage medium for identifying reason of checking abnormality
CN114064510A (en) Function testing method and device, electronic equipment and storage medium
CN112417007A (en) Data analysis method and device, electronic equipment and storage medium
CN113642443A (en) Model testing method and device, electronic equipment and storage medium
CN112420146A (en) Information security management method and system
JP7302223B2 (en) Script detection device, method and program
CN115238805B (en) Training method of abnormal data recognition model and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination