CN113569929A - Internet service providing method and device based on small sample expansion and electronic equipment - Google Patents

Internet service providing method and device based on small sample expansion and electronic equipment Download PDF

Info

Publication number
CN113569929A
CN113569929A CN202110799822.0A CN202110799822A CN113569929A CN 113569929 A CN113569929 A CN 113569929A CN 202110799822 A CN202110799822 A CN 202110799822A CN 113569929 A CN113569929 A CN 113569929A
Authority
CN
China
Prior art keywords
sample data
small sample
internet service
data
small
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110799822.0A
Other languages
Chinese (zh)
Other versions
CN113569929B (en
Inventor
李达
丁楠
苏绥绥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qilu Information Technology Co Ltd
Original Assignee
Beijing Qilu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qilu Information Technology Co Ltd filed Critical Beijing Qilu Information Technology Co Ltd
Priority to CN202110799822.0A priority Critical patent/CN113569929B/en
Publication of CN113569929A publication Critical patent/CN113569929A/en
Application granted granted Critical
Publication of CN113569929B publication Critical patent/CN113569929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Transfer Between Computers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an internet service providing method, an internet service providing device and electronic equipment based on small sample expansion, wherein the method comprises the following steps: generating a mixed sample with a label based on the shared features of the first sample data and the small sample data; training a binary model by adopting a mixed sample with a first proportion; inputting the rest mixed samples into the trained two-classification model to obtain a predicted value; extracting target sample data from the first sample data according to the predicted value to expand the small sample data; training a preset model for presetting internet service by adopting the expanded small sample data; and processing the assigned tasks of the preset Internet service according to the trained preset model. According to the method, a two-classification model is trained on the basis of one part of mixed samples, target sample data is extracted according to the expression of the trained two-classification model on the other part of mixed samples to expand the small sample data, and a modeling sample of the small sample is obtained, so that the individual requirements on the small sample Internet service are met.

Description

Internet service providing method and device based on small sample expansion and electronic equipment
Technical Field
The invention relates to the technical field of data processing, in particular to an internet service providing method and device based on small sample expansion, electronic equipment and a computer readable medium.
Background
With the development of the internet, various internet service platforms have appeared, such as: the system comprises an online shopping platform, an online car booking platform, a sharing platform, a map, music and other internet-based service platforms. These platforms typically provide services to user devices by way of Application (APP) or hypertext Markup Language (H5) pages.
In providing services, the user equipment is typically analyzed through a machine learning model to provide personalized services. However, the superior machine learning model is usually constructed based on training samples with high feature richness. In an actual scene, obtaining a large amount of labeling sample data is time-consuming and labor-consuming, and in some scenes, only a small amount of labeling samples can be obtained, resulting in poor performance of the constructed model. Therefore, a method for expanding a small sample with only a small number of labels is needed to meet the personalized demand of the small sample internet service.
Disclosure of Invention
In view of the above, the present invention is directed to an internet service providing method, apparatus, electronic device and computer readable medium based on small sample expansion, so as to at least partially solve at least one of the above technical problems.
In order to solve the above technical problem, a first aspect of the present invention provides an internet service providing method based on small sample expansion, where the method includes:
generating a blended sample having a tag identifying whether data is derived from the first sample data or the small sample data based on shared characteristics of the first sample data and the small sample data;
training a binary model by adopting a mixed sample with a first proportion;
inputting the rest mixed samples into the trained two-classification model to obtain a predicted value;
extracting target sample data from the first sample data according to the predicted value to expand the small sample data;
training a preset model for presetting internet service by adopting the expanded small sample data;
and processing the assigned tasks of the preset Internet service according to the trained preset model.
According to a preferred embodiment of the present invention, the generating a mixed sample with a label based on the shared characteristics of the first sample data and the small sample data comprises:
determining sharing characteristics according to the task types of the first sample data and the small sample data;
and extracting the shared features from the first sample data and labeling the shared features with a first label, and extracting the shared features from the small sample data and labeling the shared features with a second label to generate a mixed sample.
According to a preferred embodiment of the present invention, the extracting target sample data from the first sample data according to the predicted value to expand the small sample data includes:
performing box separation processing on the predicted value to obtain N boxes;
determining threshold binning according to the predicted value in each binning;
determining a threshold value according to the threshold value binning;
and inputting all the first sample data into the trained two-classification model to obtain a prediction result, and extracting target sample data from the first sample data according to the prediction result and the threshold value to expand the small sample data.
According to a preferred embodiment of the present invention, an absolute value of a difference between the small sample data cumulative percentage and the first sample data cumulative percentage in each bin is calculated, and then the bin corresponding to the largest absolute value is taken as a threshold bin.
According to a preferred embodiment of the present invention, the task type is risk device identification, and the sharing feature includes: at least one of login time and resource quota authentication time.
According to a preferred embodiment of the present invention, the predicted values are binned by equal frequency binning.
According to a preferred embodiment of the present invention, the method is used for expanding the small sample data generated by the internet service of the H5 page based on the first sample data, and the preset model comprises a preset model based on the H5 page;
the specified task is a specified task of processing an internet service of the H5 page.
A second aspect of the present invention provides an internet service providing apparatus based on a small sample extension, the apparatus including:
a generating module for generating a blended sample having a tag identifying whether data is derived from the first sample data or the small sample data based on shared characteristics of the first sample data and the small sample data;
the first training module is used for training the two-classification model by adopting the mixed sample with the first proportion;
the input module is used for inputting the residual mixed samples into the trained two-classification model to obtain a predicted value;
the extraction module is used for extracting target sample data from the first sample data according to the predicted value to expand the small sample data;
the second training module is used for training a preset model of preset internet service by adopting the expanded small sample data;
and the processing module is used for processing the assigned tasks of the preset Internet service according to the trained preset model.
According to a preferred embodiment of the present invention, the generating module includes:
the determining module is used for determining sharing characteristics according to the task types of the first sample data and the small sample data;
and the marking module is used for extracting the shared feature from the first sample data and marking a first label, extracting the shared feature from the small sample data and marking a second label, and generating a mixed sample.
According to a preferred embodiment of the present invention, the extraction module comprises:
the box dividing module is used for carrying out box dividing processing on the predicted value to obtain N boxes;
the first determining module is used for determining threshold binning according to the predicted value in each binning;
the second determining module is used for determining the threshold value according to the threshold value binning;
and the sub-extraction module is used for inputting all the first sample data into the trained two-classification model to obtain a prediction result, and extracting target sample data from the first sample data according to the prediction result and the threshold value to expand the small sample data.
According to a preferred embodiment of the present invention, the first determining module is configured to calculate an absolute value of a difference between a small sample data cumulative percentage and a first sample data cumulative percentage in each bin, and then use a bin corresponding to a maximum absolute value as a threshold bin.
According to a preferred embodiment of the present invention, the task type is risk device identification, and the sharing feature includes: at least one of login time and resource quota authentication time.
According to a preferred embodiment of the present invention, the binning module performs binning processing on the predicted values in an equal frequency binning manner.
According to a preferred embodiment of the present invention, the method is used for expanding the small sample data generated by the internet service of the H5 page based on the first sample data, and the preset model comprises a preset model based on the H5 page; the specified task is a specified task of processing an internet service of the H5 page.
To solve the above technical problem, a third aspect of the present invention provides an electronic device, comprising:
a processor; and
a memory storing computer executable instructions that, when executed, cause the processor to perform the method described above.
To solve the above technical problems, a fourth aspect of the present invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs which, when executed by a processor, implement the above method.
The method includes the steps that a mixed sample with a label is generated based on sharing characteristics of rich first sample data and a small amount of labeled small sample data, and the label is used for identifying whether data is derived from the first sample data or the small sample data; training a binary model by adopting a mixed sample with a first proportion; inputting the rest mixed samples into the trained two-classification model to obtain a predicted value; and extracting target sample data from the first sample data according to the predicted value to expand the small sample data. According to the method, a two-classification model is trained on the basis of one part of mixed samples, the target sample data is extracted according to the expression of the trained two-classification model on the other part of mixed samples to expand the small sample data, and the modeling samples of the small sample internet service are obtained, so that the individual requirements on the small sample internet service can be met.
Drawings
In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects obtained more clear, the following will describe in detail the embodiments of the present invention with reference to the accompanying drawings. It should be noted, however, that the drawings described below are only illustrations of exemplary embodiments of the invention, from which other embodiments can be derived by those skilled in the art without inventive step.
Fig. 1 is a schematic flowchart of an internet service providing method based on small sample expansion according to an embodiment of the present invention;
fig. 2 is a schematic structural framework diagram of an internet service providing apparatus based on a small sample extension according to an embodiment of the present invention;
FIG. 3 is a block diagram of an exemplary embodiment of an electronic device in accordance with the present invention;
FIG. 4 is a schematic diagram of one embodiment of a computer-readable medium of the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention may be embodied in many specific forms, and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.
The structures, properties, effects or other characteristics described in a certain embodiment may be combined in any suitable manner in one or more other embodiments, while still complying with the technical idea of the invention.
In describing particular embodiments, specific details of structures, properties, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by one skilled in the art. However, it is not excluded that a person skilled in the art may implement the invention in a specific case without the above-described structures, performances, effects or other features.
The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, some operations/steps in the flowcharts may be divided, some operations/steps may be combined or partially combined, and the like, and the execution order shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.
The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The same reference numerals denote the same or similar elements, components, or parts throughout the drawings, and thus, a repetitive description thereof may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these elements, components, or sections should not be limited by these terms. That is, these phrases are used only to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention. Furthermore, the term "and/or", "and/or" is intended to include all combinations of any one or more of the listed items.
Referring to fig. 1, fig. 1 is a method for providing internet services based on small sample expansion according to the present invention, where the internet services may be services provided by various internet service platforms, for example: the present invention is not limited to a specific example, and may be an online shopping platform, an online car booking platform, a sharing platform, a search platform, a social platform, and the like, as long as the platform provides services based on the internet. As shown in fig. 1, the method includes:
s1, generating a mixed sample with a label based on the shared characteristics of the first sample data and the small sample data,
in the embodiment of the invention, the first sample data and the small sample data have at least one same shared characteristic, the types of the processed tasks are the same, and meanwhile, the data volume of the first sample data is larger than a first threshold value and the marking data is larger than first preset marking data, so that the first sample data can be ensured to contain enough data volume and rich characteristic data, and the performance of a training model is improved. The data volume of the small sample data is smaller than the second threshold and the labeled data is smaller than the second preset labeled data, so that the data volume and the data characteristics of the small sample data are very limited, and obviously, the performance of the training model is influenced. The first threshold is greater than or equal to the second threshold, and the first preset marking data is greater than or equal to the second preset marking data.
For example, the first sample data and the small sample data may be sample data accumulated by the same internet service in different internet scenarios, where the different internet scenarios may be: APP, H5 page, etc. may provide an implementation of internet services. The task type is related to the provided internet service, and the shared characteristic refers to the same variable in the first sample data and the small sample data.
Taking the resource exchange service as an example, since there may be risks such as fraud, overdue, etc. in the resource exchange process, the task type related to the resource exchange service may be risk device identification, and the sharing feature includes: at least one of login time and resource quota authentication time. Wherein the resource refers to any available substance, information, money, time, etc. Information resources include computing resources and various types of data resources. The data resources include various private data in various domains. The resource quota authentication is an authentication of whether the device has the right to acquire the resource, and may be authenticated by a specific resource management mechanism or by all parties of the resource.
In the present invention, the tag is used to identify whether data is derived from the first sample data or the small sample data. For example, the data tag derived from the first sample data may be set to 1, and the data tag derived from the small sample data may be set to 0.
Illustratively, the generating the blended sample with the label based on the shared features of the first sample data and the small sample data comprises:
s11, determining sharing characteristics according to the task types of the first sample data and the small sample data;
for example, the corresponding relationship between the task type and the shared feature may be configured in advance, and then the corresponding shared feature may be searched for from the corresponding relationship according to the current task type. Such as: the task type may be configured to identify the corresponding shared characteristic for the risky device as a login time.
Further, it is considered that a specific internet service has a certain influence on the sharing characteristics, that is, even if the task types are the same, the sharing characteristics may be different due to the difference of the internet services. For example, for risk device identification, in a social service, the sharing feature may include, in addition to login time: capital and social contact records; in the resource exchange service, the sharing feature may include, in addition to the login time: resource quota authentication time. Therefore, in order to improve the accuracy of the shared features, the task type and the corresponding relation between the internet service and the shared features may be configured in advance.
And S12, extracting the shared features from the first sample data and labeling with the first label, and extracting the shared features from the small sample data and labeling with the second label to generate a mixed sample.
Wherein, the first sample data (such as the device data on APP) may contain a plurality of variables, and the small sample data (such as the device data on H5 page) may also contain a plurality of variables, but the same variable exists between the first sample data and the small sample data, i.e. the shared feature, then this step extracts and labels the shared feature of the first sample data and the small sample data, respectively, to generate a mixed sample. The variables may also be called features, and may be different according to task types. Such as risk device identification, the variable may be a device-related characteristic such as: device model, device login information, etc.
S2, training a binary model by adopting the mixed sample with the first proportion;
illustratively, the first ratio is greater than 50% to ensure that most of the mixed samples are used for training the binary model, thereby improving the accuracy of the model. Preferably, the first proportion is 80%. The two classification models may be: decision tree models, random forest models, support vector machine models, etc., the invention is not particularly limited.
S3, inputting the residual mixed samples into the trained two-class classification model to obtain a predicted value;
in the step, the trained two-classification model is subjected to prediction verification through the residual mixed sample, so that the performance of the trained two-classification model on the mixed sample is calculated, and target sample data is extracted from the mixed sample according to the performance to expand the small sample data. The form of the predicted value is preferably a fractional value, namely, the residual mixed samples are scored through a trained two-classification model.
S4, extracting target sample data from the first sample data according to the predicted value to expand the small sample data;
the target sample data is data which is screened from the first sample data according to a threshold value determined by the performance of the trained two-classification model on the mixed sample, is consistent with the distribution of the small sample data, and can be used for expanding the small sample data. In one example, such performance may be determined by analyzing the predictive values of the binary model for the mixed samples. Illustratively, the extracting target sample data from the first sample data according to the predicted value to expand the small sample data comprises:
s41, performing box separation processing on the predicted values to obtain N boxes;
wherein data binning (also referred to as discrete binning or segmentation) is a method of grouping a number of consecutive values into a smaller number of "bins" for reducing the impact of minor observation errors. The method can comprise the following steps: the non-supervision box separation method of equidistant box separation, equal frequency box separation, etc. can also include: and a supervision box separation method such as chi-square box separation, minimum entropy box separation and the like. The prediction value is preferably subjected to binning processing in an equal-frequency binning mode. After the predicted values are subjected to binning processing, each bin corresponds to a numerical value interval, the left end point of the bin corresponds to the left end point of the data interval, and the right end point of the bin also corresponds to the left end point of the data interval.
S42, determining threshold binning according to the predicted value in each binning;
in the present invention, threshold binning is used to distinguish the degree of separation of prediction data from a first sample or small sample. The value range of the threshold value binning is [0, 1], and the larger the threshold value binning is, the better the sample distinguishing degree is.
Illustratively, the absolute value of the difference between the small sample data accumulative ratio and the first sample data accumulative ratio in each sub-box is calculated, and then the sub-box corresponding to the maximum absolute value is taken as the threshold sub-box. The cumulative percentage of the small sample data in each sub-box is the proportion of the cumulative number of the small sample data in the sub-box to all the small sample numbers, and the cumulative percentage of the first sample data in each sub-box is the proportion of the cumulative number of the first sample data in the sub-box to all the first sample numbers.
S43, determining a threshold value according to the threshold value binning;
the threshold value can be determined according to the actual service in the embodiment of the invention. For example, the left end point of the bin where the threshold bin is located may be used as the threshold, or the right end point of the bin where the threshold bin is located may be used as the threshold, and the present invention is not limited in particular.
And S44, inputting all the first sample data into the trained two-classification model to obtain a prediction result, and extracting target sample data from the first sample data according to the prediction result and the threshold value to expand the small sample data.
For example, the threshold is the right end point of the bin, and the small sample data may be expanded by using the first sample data with the prediction result greater than the threshold as the target sample data. The threshold is a left end point of the binning, and the first sample data with the prediction result smaller than the threshold can be used as target sample data to expand the small sample data.
S5, training a preset model of preset Internet service by adopting the expanded small sample data;
and S6, processing the assigned tasks of the preset Internet service according to the trained preset model.
Illustratively, the method is used for expanding small sample data generated by internet services of the H5 page based on the first sample data, and then the preset model comprises a preset model based on the H5 page;
the designated task is a designated task of processing the preset internet service of the H5 page.
Fig. 2 is an internet service providing apparatus based on a small sample extension of the present invention, as shown in fig. 2, the apparatus comprising:
a generating module 21 configured to generate a mixed sample having a label for identifying whether data is derived from the first sample data or the small sample data based on a shared characteristic of the first sample data and the small sample data;
a first training module 22, configured to train a binary model using the mixed samples of the first proportion;
the input module 23 is configured to input the remaining mixed samples into the trained two-class classification model to obtain a predicted value;
the extracting module 24 is configured to extract target sample data from the first sample data according to the predicted value to expand the small sample data;
the second training module 25 is configured to train a preset model of a preset internet service by using the expanded small sample data;
and the processing module 26 is configured to process the specified task of the preset internet service according to the trained preset model.
In one embodiment, the generating module 21 includes:
the determining module is used for determining sharing characteristics according to the task types of the first sample data and the small sample data;
and the marking module is used for extracting the shared feature from the first sample data and marking a first label, extracting the shared feature from the small sample data and marking a second label, and generating a mixed sample.
The extraction module 24 includes:
the box dividing module is used for carrying out box dividing processing on the predicted value to obtain N boxes;
the first determining module is used for determining threshold binning according to the predicted value in each binning;
the second determining module is used for determining the threshold value according to the threshold value binning;
and the sub-extraction module is used for inputting all the first sample data into the trained two-classification model to obtain a prediction result, and extracting target sample data from the first sample data according to the prediction result and the threshold value to expand the small sample data.
Optionally, the binning module performs binning processing on the predicted value in an equal-frequency binning mode.
The first determining module is used for calculating an absolute value of a difference between the small sample data accumulative ratio and the first sample data accumulative ratio in each sub-box, and then taking the sub-box corresponding to the maximum absolute value as a threshold sub-box.
Wherein the task type is risk device identification, and the sharing characteristic includes: at least one of login time and resource quota authentication time.
Optionally, the method is used for expanding small sample data generated by internet services of the H5 page based on the first sample data, and the preset model comprises a preset model based on the H5 page; the specified task is a specified task of processing an internet service of the H5 page.
Those skilled in the art will appreciate that the modules in the above-described embodiments of the apparatus may be distributed as described in the apparatus, and may be correspondingly modified and distributed in one or more apparatuses other than the above-described embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
In the following, embodiments of the electronic device of the present invention are described, which may be regarded as an implementation in physical form for the above-described embodiments of the method and apparatus of the present invention. Details described in the embodiments of the electronic device of the invention should be considered supplementary to the embodiments of the method or apparatus described above; for details which are not disclosed in embodiments of the electronic device of the invention, reference may be made to the above-described embodiments of the method or the apparatus.
Fig. 3 is a block diagram of an exemplary embodiment of an electronic device according to the present invention. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 3, the electronic apparatus 300 of the exemplary embodiment is represented in the form of a general-purpose data processing apparatus. The components of electronic device 300 may include, but are not limited to: at least one processing unit 310, at least one memory unit 320, a bus 330 connecting different electronic device components (including the memory unit 320 and the processing unit 310), a display unit 340, and the like.
The storage unit 320 stores a computer readable program, which may be a code of a source program or a read-only program. The program may be executed by the processing unit 310 such that the processing unit 310 performs the steps of various embodiments of the present invention. For example, the processing unit 310 may perform the steps as shown in fig. 1.
The storage unit 320 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)3201 and/or a cache storage unit 3202, and may further include a read only memory unit (ROM) 3203. The storage unit 320 may also include a program/utility 3204 having a set (at least one) of program modules 3205, such program modules 3205 including, but not limited to: operating the electronic device, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 330 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 300 may also communicate with one or more external devices 100 (e.g., keyboards, displays, networking devices, bluetooth devices, etc.), enable a user to interact with the electronic device 300 via the external devices 100, and/or enable the electronic device 300 to communicate with one or more other data processing devices (e.g., routers, modems, etc.). Such communication may occur via input/output (I/O) interfaces 350, and may also occur via a network adapter 360 to one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet. Network adapter 360 may communicate with other modules of electronic device 300 via bus 330. It should be appreciated that although not shown in FIG. 3, other hardware and/or software modules may be used in electronic device 300, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID electronics, tape drives, and data backup storage electronics, among others.
FIG. 4 is a schematic diagram of one computer-readable medium embodiment of the present invention. As shown in fig. 4, the computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic device, apparatus, or device that is electronic, magnetic, optical, electromagnetic, infrared, or semiconductor, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer program, when executed by one or more data processing devices, enables the computer-readable medium to implement the above-described method of the invention, namely: generating a blended sample having a tag identifying whether data is derived from the first sample data or the small sample data based on shared characteristics of the first sample data and the small sample data; training a binary model by adopting a mixed sample with a first proportion; inputting the rest mixed samples into the trained two-classification model to obtain a predicted value; extracting target sample data from the first sample data according to the predicted value to expand the small sample data; training a preset model for presetting internet service by adopting the expanded small sample data; and processing the assigned tasks of the preset Internet service according to the trained preset model.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a data processing device (which can be a personal computer, a server, or a network device, etc.) execute the above-mentioned method according to the present invention.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution electronic device, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including object oriented programming languages such as Java, C + + or the like and conventional procedural programming languages, such as "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In summary, the present invention can be implemented as a method, an apparatus, an electronic device, or a computer-readable medium executing a computer program. Some or all of the functions of the present invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP).
While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims (10)

1. An internet service providing method based on small sample expansion, the method comprising:
generating a blended sample having a tag identifying whether data is derived from the first sample data or the small sample data based on shared characteristics of the first sample data and the small sample data;
training a binary model by adopting a mixed sample with a first proportion;
inputting the rest mixed samples into the trained two-classification model to obtain a predicted value;
extracting target sample data from the first sample data according to the predicted value to expand the small sample data;
training a preset model for presetting internet service by adopting the expanded small sample data;
and processing the assigned tasks of the preset Internet service according to the trained preset model.
2. The method of claim 1, wherein generating the blended sample with the label based on the shared features of the first sample data and the small sample data comprises:
determining sharing characteristics according to the task types of the first sample data and the small sample data;
and extracting the shared features from the first sample data and labeling the shared features with a first label, and extracting the shared features from the small sample data and labeling the shared features with a second label to generate a mixed sample.
3. The method according to claim 1 or 2, wherein the extracting target sample data from the first sample data according to the predicted value to expand the small sample data comprises:
performing box separation processing on the predicted value to obtain N boxes;
determining threshold binning according to the predicted value in each binning;
determining a threshold value according to the threshold value binning;
and inputting all the first sample data into the trained two-classification model to obtain a prediction result, and extracting target sample data from the first sample data according to the prediction result and the threshold value to expand the small sample data.
4. The method according to claim 3, wherein an absolute value of a difference between the small sample data accumulative ratio and the first sample data accumulative ratio in each bin is calculated, and then the bin corresponding to the maximum absolute value is taken as a threshold bin.
5. The method of any of claims 2-4, wherein the task type is risk device identification, and wherein the shared characteristic comprises: at least one of login time and resource quota authentication time.
6. The method according to claim 3 or 4, characterized in that the prediction values are binned in an equal frequency binning manner.
7. The method of any one of claims 1-6, wherein the method is used for expanding Internet service-generated small sample data of H5 page based on the first sample data, and the preset model comprises a preset model based on H5 page;
the specified task is a specified task of processing an internet service of the H5 page.
8. An internet service providing apparatus based on small sample expansion, the apparatus comprising:
a generating module for generating a blended sample having a tag identifying whether data is derived from the first sample data or the small sample data based on shared characteristics of the first sample data and the small sample data;
the first training module is used for training the two-classification model by adopting the mixed sample with the first proportion;
the input module is used for inputting the residual mixed samples into the trained two-classification model to obtain a predicted value;
the extraction module is used for extracting target sample data from the first sample data according to the predicted value to expand the small sample data;
the second training module is used for training a preset model of preset internet service by adopting the expanded small sample data;
and the processing module is used for processing the assigned tasks of the preset Internet service according to the trained preset model.
9. An electronic device, comprising:
a processor; and
a memory storing computer-executable instructions that, when executed, cause the processor to perform the method of any of claims 1-7.
10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-7.
CN202110799822.0A 2021-07-15 2021-07-15 Internet service providing method and device based on small sample expansion and electronic equipment Active CN113569929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110799822.0A CN113569929B (en) 2021-07-15 2021-07-15 Internet service providing method and device based on small sample expansion and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110799822.0A CN113569929B (en) 2021-07-15 2021-07-15 Internet service providing method and device based on small sample expansion and electronic equipment

Publications (2)

Publication Number Publication Date
CN113569929A true CN113569929A (en) 2021-10-29
CN113569929B CN113569929B (en) 2024-03-01

Family

ID=78165011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110799822.0A Active CN113569929B (en) 2021-07-15 2021-07-15 Internet service providing method and device based on small sample expansion and electronic equipment

Country Status (1)

Country Link
CN (1) CN113569929B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118068A (en) * 2022-01-26 2022-03-01 北京淇瑀信息科技有限公司 Method and device for amplifying training text data and electronic equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169518A (en) * 2017-05-18 2017-09-15 北京京东金融科技控股有限公司 Data classification method, device, electronic installation and computer-readable medium
WO2018136369A1 (en) * 2017-01-20 2018-07-26 Microsoft Technology Licensing, Llc Pre-statistics of data for node of decision tree
CN108388924A (en) * 2018-03-08 2018-08-10 平安科技(深圳)有限公司 A kind of data classification method, device, equipment and computer readable storage medium
CN110519128A (en) * 2019-09-20 2019-11-29 西安交通大学 A kind of operating system recognition methods based on random forest
CN111178380A (en) * 2019-11-15 2020-05-19 腾讯科技(深圳)有限公司 Data classification method and device and electronic equipment
CN111444094A (en) * 2020-03-25 2020-07-24 中国邮政储蓄银行股份有限公司 Test data generation method and system
CN111488892A (en) * 2019-01-25 2020-08-04 顺丰科技有限公司 Sample data generation method and device
CN111783893A (en) * 2017-09-08 2020-10-16 第四范式(北京)技术有限公司 Method and system for generating combined features of machine learning samples
CN112116168A (en) * 2020-09-29 2020-12-22 中国银行股份有限公司 User behavior prediction method and device and electronic equipment
CN112464544A (en) * 2020-11-17 2021-03-09 北京工业大学 Method for constructing model for predicting dioxin emission concentration in urban solid waste incineration process

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018136369A1 (en) * 2017-01-20 2018-07-26 Microsoft Technology Licensing, Llc Pre-statistics of data for node of decision tree
CN107169518A (en) * 2017-05-18 2017-09-15 北京京东金融科技控股有限公司 Data classification method, device, electronic installation and computer-readable medium
CN111783893A (en) * 2017-09-08 2020-10-16 第四范式(北京)技术有限公司 Method and system for generating combined features of machine learning samples
CN108388924A (en) * 2018-03-08 2018-08-10 平安科技(深圳)有限公司 A kind of data classification method, device, equipment and computer readable storage medium
CN111488892A (en) * 2019-01-25 2020-08-04 顺丰科技有限公司 Sample data generation method and device
CN110519128A (en) * 2019-09-20 2019-11-29 西安交通大学 A kind of operating system recognition methods based on random forest
CN111178380A (en) * 2019-11-15 2020-05-19 腾讯科技(深圳)有限公司 Data classification method and device and electronic equipment
CN111444094A (en) * 2020-03-25 2020-07-24 中国邮政储蓄银行股份有限公司 Test data generation method and system
CN112116168A (en) * 2020-09-29 2020-12-22 中国银行股份有限公司 User behavior prediction method and device and electronic equipment
CN112464544A (en) * 2020-11-17 2021-03-09 北京工业大学 Method for constructing model for predicting dioxin emission concentration in urban solid waste incineration process

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DAJI TANG 等: "Deep Neural Network based Disease Discrimination Learning from Small Medical Image Training Set and User Feedback", 《2018 IEEE CONFS ON INTERNET OF THINGS, GREEN COMPUTING AND COMMUNICATIONS》, pages 580 - 585 *
张旭 等: "电信行业基于种子用户群扩展技术的定向营销研究与应用", 《运营商大数据专栏》, no. 1, pages 166 - 173 *
王宇恒: "推荐系统中随机森林算法的优化与应用", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 07, pages 138 - 1245 *
管正雄: "基于深度生成模型的数据增强方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 04, pages 138 - 334 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118068A (en) * 2022-01-26 2022-03-01 北京淇瑀信息科技有限公司 Method and device for amplifying training text data and electronic equipment
CN114118068B (en) * 2022-01-26 2022-04-29 北京淇瑀信息科技有限公司 Method and device for amplifying training text data and electronic equipment

Also Published As

Publication number Publication date
CN113569929B (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN107679039B (en) Method and device for determining statement intention
US11003716B2 (en) Discovery, characterization, and analysis of interpersonal relationships extracted from unstructured text data
US11551123B2 (en) Automatic visualization and explanation of feature learning output from a relational database for predictive modelling
US20200104367A1 (en) Vector Representation Based on Context
CN108932220A (en) article generation method and device
US11144569B2 (en) Operations to transform dataset to intent
CN110708285B (en) Flow monitoring method, device, medium and electronic equipment
CN111435362B (en) Antagonistic training data enhancement for generating a correlation response
US20150347467A1 (en) Dynamic creation of domain specific corpora
CN109190123B (en) Method and apparatus for outputting information
CN110059172B (en) Method and device for recommending answers based on natural language understanding
CN113569929B (en) Internet service providing method and device based on small sample expansion and electronic equipment
US10198426B2 (en) Method, system, and computer program product for dividing a term with appropriate granularity
CN112348560A (en) Intelligent advertisement material auditing method and device and electronic equipment
CN112732896B (en) Target information display method, device, electronic equipment and medium
US20230161948A1 (en) Iteratively updating a document structure to resolve disconnected text in element blocks
CN114897099A (en) User classification method and device based on passenger group deviation smooth optimization and electronic equipment
US20210295036A1 (en) Systematic language to enable natural language processing on technical diagrams
CN113570205A (en) API risk equipment identification method and device based on single classification and electronic equipment
CN111859985B (en) AI customer service model test method and device, electronic equipment and storage medium
CN106462614B (en) Information analysis system, information analysis method, and information analysis program
CN114065752A (en) Text-based risk equipment identification method and device and electronic equipment
US11620605B2 (en) Summarizing business process models
US11132500B2 (en) Annotation task instruction generation
US20210117853A1 (en) Methods and systems for automated feature generation utilizing formula semantification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant