CN110087230B - Data processing method, data processing device, storage medium and electronic equipment - Google Patents

Data processing method, data processing device, storage medium and electronic equipment Download PDF

Info

Publication number
CN110087230B
CN110087230B CN201910342899.8A CN201910342899A CN110087230B CN 110087230 B CN110087230 B CN 110087230B CN 201910342899 A CN201910342899 A CN 201910342899A CN 110087230 B CN110087230 B CN 110087230B
Authority
CN
China
Prior art keywords
communication identifier
predicted
type
data
communication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910342899.8A
Other languages
Chinese (zh)
Other versions
CN110087230A (en
Inventor
孙承露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TONGDUN TECHNOLOGY Co.,Ltd.
Original Assignee
Tongdun Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongdun Holdings Co Ltd filed Critical Tongdun Holdings Co Ltd
Priority to CN201910342899.8A priority Critical patent/CN110087230B/en
Publication of CN110087230A publication Critical patent/CN110087230A/en
Application granted granted Critical
Publication of CN110087230B publication Critical patent/CN110087230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/18Processing of user or subscriber data, e.g. subscribed services, user preferences or user profiles; Transfer of user or subscriber data
    • H04W8/20Transfer of user or subscriber data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The embodiment of the invention provides a data processing method, a data processing device, a storage medium and electronic equipment, wherein the method comprises the following steps: receiving a communication identifier to be predicted, and acquiring characteristic data of multiple dimensions of the communication identifier to be predicted; inputting the characteristic data of the multiple dimensions into a deep learning model to obtain the calculation data of the communication identifier to be predicted; and predicting the type of the communication identifier to be predicted according to the calculation data. The communication identifier type prediction is realized through the deep learning model, and the efficiency and the accuracy of the communication identifier prediction are improved, so that the waste of time and cost during communication with the first type communication identifier is avoided.

Description

Data processing method, data processing device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method, an apparatus, a storage medium, and an electronic device.
Background
With the popularity of mobile communications, there are a large number of telephone numbers that have been down, owed, invalid, and empty, and only if these telephone numbers are difficult to identify. For example, when a phone number with arrears is dialed, the phone number is found to be out of service and arrears, and if a short message is sent to the phone number with arrears, no prompt is given at all.
A method for deducing a blank number is provided in the related technology, and the method judges that the telephone number with a recent call behavior is not the blank number mainly based on the data of an operator. This method has several problems:
1. the data source is single. For example, if a call is made, it is determined as a non-null number, and other factors are not considered, which may cause erroneous determination.
2. The real-time performance is poor. For example, a mobile phone with a call behavior in about three months is a non-empty mobile phone number, and a mobile phone number which is stopped in about three months cannot be judged; if the mobile phone with the conversation behavior in the last month is a non-blank mobile phone, the mobile phone number which is stopped in the last month cannot be judged.
3. The matching degree is low. For example, no matter whether the mobile phone number is normally used or not, few mobile phone numbers can be matched without a call behavior within one month, and thus few mobile phone numbers which can not be normally used are recognized.
Therefore, a new data processing method, device, storage medium and electronic device are needed to efficiently and accurately predict the type of the communication identifier.
The above information disclosed in this background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of the above, the present invention provides a data processing method, an apparatus, a storage medium, and an electronic device, which are needed to efficiently and accurately predict the type of the communication identifier.
Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.
According to a first aspect of the present invention, there is provided a data processing method, wherein the method comprises: receiving a communication identifier to be predicted, and acquiring characteristic data of multiple dimensions of the communication identifier to be predicted; inputting the characteristic data of the multiple dimensions into a deep learning model to obtain the calculation data of the communication identifier to be predicted; and predicting the type of the communication identifier to be predicted according to the calculation data.
According to some embodiments, the method further comprises: screening out a first type communication identifier and a second type communication identifier from the communication identifiers to be predicted; and inputting the residual communication identification to be predicted into the deep learning model, and acquiring the calculation data of the residual communication identification to be predicted.
According to some embodiments, the step of screening out the first type communication identifier and the second type communication identifier from the communication identifiers to be predicted comprises: screening out a first primary selection type communication identifier from the communication identifiers to be predicted; screening out a second primary selection type communication identifier from the communication identifiers to be predicted; and determining a first type communication identifier and a second type communication identifier based on the first primary selection type communication identifier and the second primary selection type communication identifier.
According to some embodiments, the method for screening out the first initial type communication identifier from the communication identifiers to be predicted comprises the following steps: determining a first type cycle of feature data of multiple dimensions aiming at the multiple dimensions based on statistical data of the feature data of the multiple dimensions of sample data in multiple first preset time periods; and screening out a first primary selection type communication identifier from the communication identifiers to be predicted according to the first type period of the characteristic data of the multiple dimensions.
According to some embodiments, the method for screening out the communication identifier of the second primary selection type from the communication identifiers to be predicted comprises the following steps: determining second type periods of the feature data of multiple dimensions aiming at the multiple dimensions based on the statistical data of the feature data of the multiple dimensions of the sample data in multiple second preset time periods; and screening out a second primary selection type communication identifier from the communication identifiers to be predicted according to a second type period of the characteristic data of the multiple dimensions.
According to some embodiments, the method further comprises: acquiring a deep learning model;
the obtaining of the deep learning model comprises: constructing a multilayer deep learning network; and training the multilayer deep learning network based on the sample data to obtain a deep learning model.
According to some embodiments, the first type communication identifier comprises: at least one of a communication identifier which is cancelled, a communication identifier which is defaulting and a communication identifier which is shut down; the second type communication identifier comprises: at least one of the communication identifier being used, the communication identifier not being logged off, and the communication identifier not being a blank number.
According to a second aspect of the present invention, there is provided a data processing apparatus, wherein the apparatus comprises: the communication prediction device comprises a receiving module, a prediction module and a prediction module, wherein the receiving module is used for receiving a communication identifier to be predicted and acquiring characteristic data of multiple dimensions of the communication identifier to be predicted; the acquisition module is used for inputting the characteristic data of the multiple dimensions into a deep learning model so as to acquire the calculation data of the communication identifier to be predicted; and the prediction module is used for predicting the type of the communication identifier to be predicted according to the calculation data.
According to some embodiments, the apparatus further comprises: the screening module is used for screening out a first type communication identifier and a second type communication identifier from the communication identifiers to be predicted; the obtaining module is further configured to input the remaining communication identifiers to be predicted to the deep learning model, and obtain the calculation data of the remaining communication identifiers to be predicted.
According to some embodiments, the screening module comprises: the first screening unit is used for screening out a first primary selection type communication identifier from the communication identifiers to be predicted; the second screening unit is used for screening out a second primary selection type communication identifier from the communication identifiers to be predicted; and the determining unit is used for determining the first type communication identifier and the second type communication identifier based on the first primary selection type communication identifier and the second primary selection type communication identifier.
According to some embodiments, the first screening unit is configured to determine a first type cycle of feature data of multiple dimensions for sample data based on statistical data of the feature data of the multiple dimensions within multiple first preset time periods; and screening out a first primary selection type communication identifier from the communication identifiers to be predicted according to the first type period of the characteristic data of the multiple dimensions.
According to some embodiments, the second screening unit is configured to determine a second type period for the feature data of the plurality of dimensions of the sample data based on statistical data of the feature data of the plurality of dimensions within a plurality of second preset time periods; and screening out a second primary selection type communication identifier from the communication identifiers to be predicted according to a second type period of the characteristic data of the multiple dimensions.
According to some embodiments, the apparatus further comprises: the model acquisition module is used for acquiring a deep learning model;
the model acquisition module comprises: the building unit is used for building a multilayer deep learning network; and the training unit is used for training the multilayer deep learning network based on the sample data to obtain a deep learning model.
According to some embodiments, the first type communication identifier comprises: at least one of a communication identifier which is cancelled, a communication identifier which is defaulting and a communication identifier which is shut down; the second type communication identifier comprises: at least one of the communication identifier being used, the communication identifier not being logged off, and the communication identifier not being a blank number.
According to a third aspect of the invention, a computer-readable storage medium is provided, on which a computer program is stored, wherein the program, when executed by a processor, performs the method steps as set forth in the first aspect.
According to a fourth aspect of the present invention, there is provided an electronic apparatus, comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method steps as described in the first aspect.
In the embodiment of the invention, based on receiving the communication identifier to be predicted, the characteristic data of multiple dimensions of the communication identifier to be predicted is obtained; inputting the characteristic data of the multiple dimensions into a deep learning model to obtain the calculation data of the communication identifier to be predicted; and predicting the type of the communication identifier to be predicted according to the calculation data. The communication identifier type prediction is realized through the deep learning model, and the efficiency and the accuracy of the communication identifier prediction are improved, so that the waste of time and cost during communication with the first type communication identifier is avoided.
In the embodiment of the invention, the respective type period is determined for each type, and the communication identifiers of the first primary selection type and the second primary selection type are screened out from the communication identifiers to be predicted as much as possible by a stricter mode of comparing the characteristic data of multiple dimensions with the threshold value of the respective type period, so that the screening is carried out by fusing multiple dimensions while the accuracy of the communication identifiers of the first primary selection type and the second primary selection type is ensured, and the recall rate and the accuracy are improved.
In the embodiment of the invention, the first type communication identification and the second type communication identification are screened from the communication identification to be predicted, and the second primary type communication identification is screened from the communication identification to be predicted, so that the combination of advanced screening and a deep learning model is realized, and the type of the communication identification to be predicted is jointly determined, thereby improving the accuracy of communication identification type prediction to the maximum extent, reducing the calculation cost of the deep learning model, further improving the efficiency of communication identification type prediction, and maintaining the stability of the deep learning model.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 is a flow diagram illustrating a data processing method according to an exemplary embodiment;
FIG. 2 is a flow chart illustrating a data processing method according to another exemplary embodiment;
FIG. 3 is a flow diagram illustrating a method of obtaining a deep learning model in accordance with an exemplary embodiment;
fig. 4 is a schematic diagram of a deep learning network according to an embodiment of the present invention;
FIG. 5 is a block diagram illustrating a data processing apparatus in accordance with an exemplary embodiment;
fig. 6 is a schematic structural diagram of an electronic device according to an exemplary embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
The data processing method provided by the embodiment of the invention is described in detail below with reference to specific embodiments. It should be noted that the data processing method provided in the embodiment of the present invention may be executed by any device with computing processing capability, such as a server and/or a terminal device, and the present invention is not limited thereto. In the embodiment of the present invention, a server is taken as an example for description.
FIG. 1 is a flow diagram illustrating a data processing method according to an example embodiment.
As shown in fig. 1, the method may include, but is not limited to, the following steps:
in S110, a communication identifier to be predicted is received, and feature data of multiple dimensions of the communication identifier to be predicted is obtained.
According to the embodiment of the invention, the number of the communication identifiers to be predicted received by the server can be multiple, and when the communication identifiers to be predicted are received, the characteristic data of multiple dimensions of each communication identifier to be predicted is crawled through authorized operator data, short message data, SDK collected data and authorized terminal equipment.
In the embodiment of the invention, the communication identifier to be predicted can be a number or an account number of a terminal device or a client capable of communicating, such as a mobile phone number, a WeChat account number, a microblog account number, a Paibao account number and the like.
In this embodiment of the present invention, the feature data of multiple dimensions may include: call log, message log and operation log of user on terminal device. For example, for a mobile phone number, the feature data of multiple dimensions may be a call record, a short message record, and an operation log of a user on a mobile phone corresponding to the mobile phone number.
In S120, the feature data of the multiple dimensions are input to a deep learning model to obtain the calculation data of the communication identifier to be predicted.
In the embodiment of the invention, a deep learning model is preset, and the calculation data of the communication identifier to be predicted can be obtained by inputting the characteristic data of the communication identifier to be predicted in multiple dimensions into the deep learning model, wherein the calculation data can be a specific numerical value and is used for representing the probability that the communication identifier to be predicted is the first type communication identifier.
In S130, the type of the communication identifier to be predicted is predicted according to the calculation data.
In the embodiment of the present invention, the type of the communication identifier to be predicted may include: the communication device comprises a first type communication identifier and a second type communication identifier. The first type communication identifier comprises: at least one of a communication identifier of logout, a communication identifier of arrearage and a communication identifier of shutdown. For example, a cancelled WeChat account number, and a defaulting mobile phone number. The second type communication identifier comprises: at least one of the communication identifier being used, the communication identifier not being logged off, and the communication identifier not being a blank number. For example, the cell phone number being used, the WeChat account number.
According to the embodiment of the invention, after the calculation data of the communication identifier to be predicted is obtained, the specific numerical value in the calculation data is compared with the threshold, if the specific numerical value is greater than the threshold, the communication identifier is predicted to be the first type communication identifier, and if the specific numerical value is not greater than the threshold, the communication identifier is predicted to be the second type communication identifier.
For example, the calculated data of the deep learning model is divided into ten segments according to the probability, the fractional segment with the first type communication identifier occupying a larger proportion is set as a threshold, the prediction exceeding the threshold is the first type communication identifier, and the rest are the second type communication identifiers.
In the embodiment of the invention, based on receiving the communication identifier to be predicted, the characteristic data of multiple dimensions of the communication identifier to be predicted is obtained; inputting the characteristic data of the multiple dimensions into a deep learning model to obtain the calculation data of the communication identifier to be predicted; and predicting the type of the communication identifier to be predicted according to the calculation data. The communication identifier type prediction is realized through the deep learning model, and the efficiency and the accuracy of the communication identifier prediction are improved, so that the waste of time and cost during communication with the first type communication identifier is avoided.
The data processing method proposed in the embodiment of the present invention is further described below with reference to specific embodiments.
Fig. 2 is a flowchart illustrating a data processing method according to another exemplary embodiment.
As shown in fig. 2, the method may include, but is not limited to, the following steps:
in S210, a communication identifier to be predicted is received, and feature data of multiple dimensions of the communication identifier to be predicted is obtained.
In S220, a first type communication identifier and a second type communication identifier are screened from the communication identifiers to be predicted.
According to the embodiment of the invention, when the first type communication identifier and the second type communication identifier are screened from the communication identifiers to be predicted, the first primary type communication identifier can be screened from the communication identifiers to be predicted, the second primary type communication identifier can be screened from the communication identifiers to be predicted, and then the first type communication identifier and the second type communication identifier are determined based on the first primary type communication identifier and the second primary type communication identifier.
In this embodiment of the present invention, a set of first initial type communication identifiers screened from the communication identifiers to be predicted may be represented as W1, a set of second initial type communication identifiers screened from the communication identifiers to be predicted may be represented as W2, and the first type communication identifiers and the second type communication identifiers may be determined by the following formulas:
Y1=Set(W2–W1) (1)
Y2=Set(W1–W2) (2)
wherein Y1 represents a set of first type communication identifiers, and Y2 represents a set of second type communication identifiers. The intersection of the two sets can be removed by the above equations (1) and (2).
It should be noted that the method for determining the first-type communication identifier and the second-type communication identifier according to the above formulas (1) and (2) is only a specific example provided by the present disclosure, and the first-type communication identifier and the second-type communication identifier may also be determined by performing calculation according to other formulas on W1 and W2.
It should be noted that, before determining the first type period or the second type period of the feature data for the plurality of dimensions, the sample data may be preprocessed first. For example, a cleaning or filling process is performed on a null value, a missing value, a single value, an abnormal value of the feature data of a certain dimension.
In the embodiment of the present invention, when the first preliminary selection type communication identifier is screened out from the communication identifier to be predicted, a first type cycle for the multi-dimensional feature data may be determined based on statistical data of the multi-dimensional feature data of the sample data in a first preset time period, and then the first preliminary selection type communication identifier is screened out from the communication identifier to be predicted according to the first type cycle of the multi-dimensional feature data.
In the embodiment of the present invention, when determining the first type period for the feature data of multiple dimensions, the statistical data of the feature data of multiple dimensions of the sample data in multiple first preset time periods may be the number of call records, the number of message records, the frequency of operation logs of a user on the terminal device, and the like of the sample in multiple first preset time periods. After the first type period of each dimension is determined, a first type threshold of each dimension can be further set, and whether the identifier to be predicted is the first initially-selected type communication identifier or not is determined according to the comparison result of the statistical records of the communication identifiers to be predicted on the dimensions and the respective thresholds.
For example, the plurality of first preset time periods may be one month from the current time, two months from the current time, and three months from the current time. The dimension of the call record is denoted as a, the dimension of the message record is denoted as b, and the operation log of the user on the terminal device is denoted as c. Further, a statistic data a month away from the current time is represented as a1, a statistic data a two months away from the current time is represented as a2, a statistic data a three months away from the current time is represented as a3, b statistic data b1, b statistic data b2, b statistic data b3, c statistic data c1, c statistic data c2, and c statistic data c 3. Combining multiple dimensions of the sample data and multiple first preset time periods to obtain the probability of the first type communication identifier in each combination, for example, a combination of a1 uti 1 utic 1, a1 utib 2 utic 1, a1 utib 2 utic 2 … … is assumed, the probability of the first type communication identifier in the combination of a1 utib 2 utic 2 is highest, the first type period of the a dimension is one month away from the current time, the first type period of the b dimension is two months away from the current time, the first type period of the c dimension is two months away from the current time, and a respective first type threshold is set for each dimension. And determining whether the communication identifier to be predicted is a first initially-selected type communication identifier or not according to comparison of statistical data of the dimension a within one month from the current time, statistical data of the dimension b within two months from the current time and statistical data of the dimension c within two months from the current time with first threshold values of the respective dimensions.
In the embodiment of the present invention, when the second preliminary selection type communication identifier is screened out from the communication identifier to be predicted, a second type cycle for the multi-dimensional feature data may be determined based on statistical data of the multi-dimensional feature data of the sample data in a plurality of second preset time periods, and the second preliminary selection type communication identifier is screened out from the communication identifier to be predicted according to the second type cycle of the multi-dimensional feature data.
In the embodiment of the present invention, the statistical data of the multi-dimensional characteristic data of the sample data in the second preset time periods may be the number of call records, the number of message records, the frequency of operation logs of the user on the terminal device, and the like of the sample in the second preset time periods. After the second type period of each dimension is determined, a second type threshold of each dimension can be further set, and whether the identifier to be predicted is a second initially-selected type communication identifier or not is determined according to the comparison result of the statistical records of the communication identifiers to be predicted on the dimensions and the respective thresholds.
For example, the plurality of second preset time periods may be one year from the current time, two years from the current time, and three years from the current time. The dimension of the call record is denoted as a, the dimension of the message record is denoted as b, and the operation log of the user on the terminal device is denoted as c. Assuming that the probability of the second type communication identifier in the combination of a3 ℃b3 ℃ 2 is the highest, the second type period of the dimension a is three years away from the current time, the second type period of the dimension b is three years away from the current time, the second type period of the dimension c is two years away from the current time, and a respective second type threshold is set for each dimension. And determining whether the communication identifier to be predicted is a second initially-selected type communication identifier or not according to comparison of statistical data of a dimension a within three years from the current time, statistical data of a dimension b within three years from the current time and statistical data of a dimension c within two years from the current time with second thresholds of the respective dimensions.
It should be noted that the sample data in the embodiment of the present invention may be obtained through actual dialing or communication, so as to ensure that the sample is real and effective, and further, the characteristic data of multiple dimensions of each sample data is crawled from operator data, short message data, SDK collected data, and authorized terminal equipment.
In the above embodiment, by determining the respective type cycle for each type, and by the harsher manner of comparing the feature data of multiple dimensions with the threshold of the respective type cycle, the communication identifiers of the first primary selection type and the second primary selection type are screened out from the communication identifiers to be predicted as many as possible, and while the accuracy of the communication identifiers of the first primary selection type and the second primary selection type is ensured, multiple dimensions are fused for screening, so that the recall rate and the accuracy are improved.
In S230, the remaining communication identifiers to be predicted are input to the deep learning model, and the calculation data of the remaining communication identifiers to be predicted are obtained.
In the embodiment of the invention, the rest parts of the first type communication identification and the second type communication identification screened from the communication identifications to be predicted are input into the deep learning model to determine the calculation data of the predicted communication identifications.
In S240, the type of the communication identifier to be predicted is predicted according to the calculation data.
According to the embodiment of the present invention, after the types of the remaining communication identifiers to be predicted are predicted according to the calculation data of the remaining communication identifiers to be predicted, the types of the remaining communication identifiers to be predicted may be further merged with the first type communication identifier and the second type communication identifier determined in S220, so as to obtain all the types of the received communication identifiers to be predicted.
In the embodiment of the invention, the first type communication identification and the second type communication identification are screened from the communication identification to be predicted, and the second primary type communication identification is screened from the communication identification to be predicted, so that the combination of advanced screening and a deep learning model is realized, and the type of the communication identification to be predicted is jointly determined, thereby improving the accuracy of communication identification type prediction to the maximum extent, reducing the calculation cost of the deep learning model, further improving the efficiency of communication identification type prediction, and maintaining the stability of the deep learning model.
The method of obtaining the deep learning model is described in detail below. FIG. 3 is a flow diagram illustrating a method of obtaining a deep learning model according to an example embodiment.
As shown in fig. 3, the method may include, but is not limited to, the following flow:
in S310, a multi-layered deep learning network is constructed.
In the embodiment of the invention, the definition and understanding of the first type communication identification and the characteristic transformation are needed, the principle of a deep learning algorithm is mastered, how to adjust parameters of a neural network is established, a plurality of layers of neural network models are established, how many nodes are arranged on each layer, how many nodes are arranged on an input-output layer, how to prevent overfitting of the models, selection of loss functions, difference of nonlinear conversion functions and the like, finally different rules are formulated according to actual application scenes and data, and the formulation of the rules is adjusted by evaluating the quality of the rules on a modeling sample.
Selecting a proper neural network model, understanding the parameter meaning of each neural network, selecting proper parameters according to self service and data, constructing a multilayer neural network, selecting proper loss functions, nonlinear conversion functions, training times, adjusting the node number of hidden layers of the neural network, the layer number of the hidden layers and the like, and evaluating the quality of the model by different methods, such as a confusion matrix, recall rate, accuracy, KS and the like.
Fig. 4 is a schematic diagram of a deep learning network according to an embodiment of the present invention. As shown in fig. 4: a1 … An is An input model input vector, W1-wn are weights of synapses of neurons, b is bias, f is a transfer function (usually a nonlinear function), t is neuron output, and is expressed mathematically as t ═ f (WA '+ b), W is a weight vector, a is An input vector, and a' is a transpose of An a vector. Role of individual neurons: an n-dimensional vector space is divided into two parts (called decision boundaries) by a hyperplane, and given an input vector, the neuron can decide on which side of the hyperplane the vector lies. The hyperplane equation, Wp + b, is 0, so when we give a set of features, we can determine whether the communication identifier is the first type communication identifier.
In S320, the multi-layered deep learning network is trained based on the sample data to obtain a deep learning model.
In the embodiment of the invention, the sample data is normalized before the multilayer deep learning network is trained based on the sample data, and the model parameters are continuously iteratively optimized through evaluating the accuracy of the model by the normalized sample data, so that the model reaches the optimal state.
For example, the normalized sample data may be divided into a training set, a test set and a verification set for cross-validation of the model, and finally the probability of the first type communication identifier is output.
It should be noted that, in the sample for training the multi-layer deep learning network, the feature data of multiple dimensions of the sample is more detailed than the feature data of multiple dimensions for determining the communication identifiers of the first primary selection type and the second primary selection type, and may be partial feature data derived from the feature data of multiple dimensions for determining the communication identifiers of the first primary selection type and the second primary selection type, for example, information such as the number of calls in the last week, the total duration of calls, whether APP is installed or uninstalled in the last week, and the like.
It should be clearly understood that the present disclosure describes how to make and use particular examples, but the principles of the present disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. In the following description of the apparatus, the same parts as those of the foregoing method will not be described again.
FIG. 5 is a block diagram illustrating a data processing apparatus in accordance with an exemplary embodiment.
As shown in fig. 5, the apparatus 500 may include:
a receiving module 510, configured to receive a communication identifier to be predicted, and obtain feature data of multiple dimensions of the communication identifier to be predicted;
an obtaining module 520, configured to input the feature data of the multiple dimensions into a deep learning model to obtain calculation data of the communication identifier to be predicted;
and the predicting module 530 is configured to predict the type of the communication identifier to be predicted according to the calculation data.
In the embodiment of the invention, based on receiving the communication identifier to be predicted, the characteristic data of multiple dimensions of the communication identifier to be predicted is obtained; inputting the characteristic data of the multiple dimensions into a deep learning model to obtain the calculation data of the communication identifier to be predicted; and predicting the type of the communication identifier to be predicted according to the calculation data. The communication identifier type prediction is realized through the deep learning model, and the efficiency and the accuracy of the communication identifier prediction are improved, so that the waste of time and cost during communication with the first type communication identifier is avoided.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to perform: receiving a communication identifier to be predicted, and acquiring characteristic data of multiple dimensions of the communication identifier to be predicted; inputting the characteristic data of the multiple dimensions into a deep learning model to obtain the calculation data of the communication identifier to be predicted; and predicting the type of the communication identifier to be predicted according to the calculation data.
Fig. 6 is a schematic structural diagram of an electronic device according to an exemplary embodiment. It should be noted that the electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The above-described functions defined in the terminal of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a receiving module, an obtaining module, and a predicting module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.
Exemplary embodiments of the present invention are specifically illustrated and described above. It is to be understood that the invention is not limited to the precise construction, arrangements, or instrumentalities described herein; on the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (6)

1. A method of data processing, the method comprising:
receiving a communication identifier to be predicted, and acquiring characteristic data of multiple dimensions of the communication identifier to be predicted;
inputting the characteristic data of the multiple dimensions into a deep learning model to obtain the calculation data of the communication identifier to be predicted;
predicting the type of the communication identifier to be predicted according to the calculation data;
the types include: a first type communication identifier and a second type communication identifier;
wherein the first type communication identifier comprises: at least one of a communication identifier which is cancelled, a communication identifier which is defaulting and a communication identifier which is shut down; the second type communication identifier comprises: at least one of a communication identifier in use, a communication identifier not logged out, and a communication identifier not without a null number;
the method further comprises the following steps: screening out a first type communication identifier and a second type communication identifier from the communication identifiers to be predicted; inputting the residual communication identification to be predicted into the deep learning model, and acquiring the calculation data of the residual communication identification to be predicted;
wherein, select first type communication sign and second type communication sign from waiting to predict communication sign, include:
screening out a first primary selection type communication identifier from the communication identifiers to be predicted;
screening out a second primary selection type communication identifier from the communication identifiers to be predicted;
determining a first type communication identifier and a second type communication identifier based on the first primary selection type communication identifier and the second primary selection type communication identifier;
wherein, select first initial selection type communication sign from waiting to predict communication sign including:
determining a first type cycle of feature data of multiple dimensions aiming at the multiple dimensions based on statistical data of the feature data of the multiple dimensions of sample data in multiple first preset time periods;
and screening out a first primary selection type communication identifier from the communication identifiers to be predicted according to the first type period of the characteristic data of the multiple dimensions.
2. The method of claim 1, wherein the step of screening the communication identifier to be predicted for a second initial type comprises:
determining second type periods of the feature data of multiple dimensions aiming at the multiple dimensions based on the statistical data of the feature data of the multiple dimensions of the sample data in multiple second preset time periods;
and screening out a second primary selection type communication identifier from the communication identifiers to be predicted according to a second type period of the characteristic data of the multiple dimensions.
3. The method of claim 1, wherein the method further comprises: acquiring a deep learning model;
the obtaining of the deep learning model comprises:
constructing a multilayer deep learning network;
and training the multilayer deep learning network based on the sample data to obtain a deep learning model.
4. A data processing apparatus, characterized in that the apparatus comprises:
the communication prediction device comprises a receiving module, a prediction module and a prediction module, wherein the receiving module is used for receiving a communication identifier to be predicted and acquiring characteristic data of multiple dimensions of the communication identifier to be predicted;
the acquisition module is used for inputting the characteristic data of the multiple dimensions into a deep learning model so as to acquire the calculation data of the communication identifier to be predicted;
the prediction module is used for predicting the type of the communication identifier to be predicted according to the calculation data;
the types include: a first type communication identifier and a second type communication identifier;
wherein the first type communication identifier comprises: at least one of a communication identifier which is cancelled, a communication identifier which is defaulting and a communication identifier which is shut down; the second type communication identifier comprises: at least one of a communication identifier in use, a communication identifier not logged out, and a communication identifier not without a null number;
the device further comprises: the screening module is used for screening out a first type communication identifier and a second type communication identifier from the communication identifiers to be predicted; the obtaining module is further configured to input the remaining communication identifiers to be predicted to the deep learning model, and obtain the calculation data of the remaining communication identifiers to be predicted;
the screening module comprises a first screening unit and a second screening unit, wherein the first screening unit is used for screening a first primary selection type communication identifier from the communication identifiers to be predicted; the second screening unit is used for screening out a second primary selection type communication identifier from the communication identifiers to be predicted; the determining unit is used for determining a first type communication identifier and a second type communication identifier based on the first primary selection type communication identifier and the second primary selection type communication identifier;
the first screening unit is configured to determine a first type cycle of the feature data of multiple dimensions for the sample data based on statistical data of the feature data of the multiple dimensions in multiple first preset time periods; and screening out a first primary selection type communication identifier from the communication identifiers to be predicted according to the first type period of the characteristic data of the multiple dimensions.
5. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 1 to 3.
6. An electronic device, comprising: one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method steps of any of claims 1-3.
CN201910342899.8A 2019-04-26 2019-04-26 Data processing method, data processing device, storage medium and electronic equipment Active CN110087230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910342899.8A CN110087230B (en) 2019-04-26 2019-04-26 Data processing method, data processing device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910342899.8A CN110087230B (en) 2019-04-26 2019-04-26 Data processing method, data processing device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110087230A CN110087230A (en) 2019-08-02
CN110087230B true CN110087230B (en) 2020-09-15

Family

ID=67416831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910342899.8A Active CN110087230B (en) 2019-04-26 2019-04-26 Data processing method, data processing device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110087230B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111918323B (en) * 2019-09-18 2021-10-22 北京云海淘金数据技术有限公司 Data calibration method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107306306A (en) * 2016-04-25 2017-10-31 腾讯科技(深圳)有限公司 Communicating number processing method and processing device
CN108093405A (en) * 2017-11-06 2018-05-29 北京邮电大学 A kind of fraudulent call number analysis method and apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102009030699B3 (en) * 2009-06-26 2010-12-02 Vodafone Holding Gmbh Device and method for detecting desired and / or unwanted telephone calls depending on the user behavior of a user of a telephone

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107306306A (en) * 2016-04-25 2017-10-31 腾讯科技(深圳)有限公司 Communicating number processing method and processing device
CN108093405A (en) * 2017-11-06 2018-05-29 北京邮电大学 A kind of fraudulent call number analysis method and apparatus

Also Published As

Publication number Publication date
CN110087230A (en) 2019-08-02

Similar Documents

Publication Publication Date Title
CN112950231A (en) XGboost algorithm-based abnormal user identification method, device and computer-readable storage medium
CN111818093B (en) Neural network system, method and device for risk assessment
CN112633962B (en) Service recommendation method and device, computer equipment and storage medium
CN111311030B (en) User credit risk prediction method and device based on influence factor detection
CN111435482A (en) Outbound model construction method, outbound method, device and storage medium
CN111815169A (en) Business approval parameter configuration method and device
CN111639690A (en) Fraud analysis method, system, medium, and apparatus based on relational graph learning
CN114519435A (en) Model parameter updating method, model parameter updating device and electronic equipment
CN114782161A (en) Method, device, storage medium and electronic device for identifying risky users
CN115470867A (en) Agent matching method, device, equipment and storage medium based on knowledge graph
CN116915710A (en) Traffic early warning method, device, equipment and readable storage medium
CN110087230B (en) Data processing method, data processing device, storage medium and electronic equipment
CN115114329A (en) Method and device for detecting data stream abnormity, electronic equipment and storage medium
CN110704614B (en) Information processing method and device for predicting user group type in application
CN110880117A (en) False service identification method, device, equipment and storage medium
CN115099934A (en) High-latency customer identification method, electronic equipment and storage medium
CN111737319B (en) User cluster prediction method, device, computer equipment and storage medium
CN115130536A (en) Training method of feature extraction model, data processing method, device and equipment
CN114723239A (en) Multi-party collaborative modeling method, device, equipment, medium and program product
CN110020728B (en) Service model reinforcement learning method and device
CN114820085B (en) User screening method, related device and storage medium
CN113946758B (en) Data identification method, device, equipment and readable storage medium
CN116796180A (en) Channel success rate prediction method, channel success rate prediction device, electronic equipment and computer program product
CN111815442A (en) Link prediction method and device and electronic equipment
CN118096192A (en) Information pushing method, device, equipment and medium based on graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210910

Address after: Room 209, building 18, No. 998, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province, 310012

Patentee after: TONGDUN TECHNOLOGY Co.,Ltd.

Address before: Room 704, building 18, No. 998, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee before: TONGDUN HOLDINGS Co.,Ltd.