CN115760149A - Passenger group identification method and device based on multivariate data tagging and electronic equipment - Google Patents

Passenger group identification method and device based on multivariate data tagging and electronic equipment Download PDF

Info

Publication number
CN115760149A
CN115760149A CN202211263248.8A CN202211263248A CN115760149A CN 115760149 A CN115760149 A CN 115760149A CN 202211263248 A CN202211263248 A CN 202211263248A CN 115760149 A CN115760149 A CN 115760149A
Authority
CN
China
Prior art keywords
target
data
preset
module
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211263248.8A
Other languages
Chinese (zh)
Inventor
陈锋
刘涛
熊俊
程群
王安琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qiyu Information and Technology Co Ltd
Original Assignee
Shanghai Qiyu Information and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qiyu Information and Technology Co Ltd filed Critical Shanghai Qiyu Information and Technology Co Ltd
Priority to CN202211263248.8A priority Critical patent/CN115760149A/en
Publication of CN115760149A publication Critical patent/CN115760149A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a passenger group identification method, a device and electronic equipment based on multivariate data tagging, wherein the method comprises the following steps: respectively carrying out multivariate data acquisition on the internal data and the external data to obtain initial information; screening out a target associated text in the initial information through a text analysis model; extracting effective information in the target associated text by combining automatic text sentiment analysis and a preset rule, and performing tagging processing on the effective information to obtain tag data; positioning targets corresponding to the label data, and dividing the targets into a plurality of first target passenger groups; and analyzing the preset indexes and the preset index change trends of the first target passenger groups to determine second target passenger groups. The invention can realize the effective utilization of the internal and external information related to the business scene and the full link monitoring of the abnormal customer group in the business scene, and improve the timeliness and the accuracy of the customer group identification, thereby guaranteeing the platform data and property safety in time and in all directions.

Description

Passenger group identification method and device based on multivariate data tagging and electronic equipment
Technical Field
The invention relates to the technical field of text processing, in particular to a method, a device and a system for identifying a guest group based on multivariate data tagging and a computer readable medium.
Background
With the development of the internet, various internet service platforms appear, which can provide products or services which are paid after being used (for example, a renting platform can provide rented products first and pay when returning, a shopping platform can provide products first and pay after being used satisfactorily, and the like) so as to bring great convenience to the life of people, but meanwhile, many illegal behaviors such as network attack, cheating, fraud and the like by using loopholes of each platform appear, and threat to the data security and property security of the platforms.
Currently, unusual customer groups are mainly identified by providing manual review before the product, or by providing post-product payment settlement. The manual examination before the product is provided is time-consuming and labor-consuming, the identification accuracy cannot be guaranteed due to data limitation, the following payment settlement of the product is provided, and the abnormal customer group is found to miss the management and control opportunity, so that the loss cannot be timely recovered. Therefore, a method that is time-saving, labor-saving, highly time-efficient and capable of identifying the abnormal customers in the whole product cycle is needed.
Disclosure of Invention
In view of the above, the present invention is directed to a method, an apparatus, an electronic device and a computer-readable medium for identifying a guest group based on multivariate data tagging, so as to at least partially solve at least one of the above technical problems.
In order to solve the above technical problem, a first aspect of the present invention provides a method for identifying a guest group based on multivariate data tagging, the method comprising:
respectively carrying out multivariate data acquisition on the internal data and the external data to obtain initial information related to a service scene;
screening out a target associated text in the initial information through a text analysis model;
extracting effective information in the target associated text by combining automatic text sentiment analysis and a preset rule, and performing tagging processing on the effective information to obtain tag data;
positioning targets corresponding to the label data, and dividing the targets into a plurality of first target passenger groups;
and analyzing the preset indexes and the preset index change trends of the first target passenger groups to determine second target passenger groups.
According to a preferred embodiment of the present invention, the analyzing the preset indexes and the preset index variation trends of the respective first target customer groups to determine the second target customer group includes:
quantitatively analyzing preset indexes of each first target passenger group to determine the discrimination degree of corresponding label data;
tracking and analyzing the change trend of preset indexes of each first target guest group to determine the current safety performance trend and safety performance degree of the corresponding first target guest group;
and screening out second target passenger groups from each first target passenger group by integrating the preset indexes, the discrimination, the safety performance degree and the safety performance trend.
According to a preferred embodiment of the invention, the method further comprises:
acquiring target data of each target in each second target guest group;
determining a second index of abnormal behavior of each second target guest group according to the target data;
and determining a final target passenger group in each second target passenger group according to the second index.
According to a preferred embodiment of the present invention, the method further comprises:
monitoring the final target customer group through a preset strategy or a preset model, and acquiring service log data of each target in the final target customer group in the service scene;
determining the number ratio of the targets with abnormal behaviors according to the service log data;
and adjusting and optimizing the preset indexes and the change trend of the preset indexes according to the quantity ratio.
According to a preferred embodiment of the present invention, the multivariate data acquisition of external data comprises:
selecting a target external data channel according to the service scene and the information heat;
collecting text data related to preselected keywords from the target external data channel;
analyzing the correlation degree of the text data and a service scene, and screening out target keywords from the preselected keywords according to the correlation degree;
and acquiring external initial information related to the service scene from an external channel according to the target keyword.
According to a preferred embodiment of the present invention, after the target associated text in the initial information is screened out by the text analysis model, the method further includes:
analyzing the target associated text;
configuring corresponding quantization indexes according to the analysis result;
and monitoring and displaying the change condition of the quantitative index.
According to a preferred embodiment of the present invention, the analyzing the target associated text includes:
the target associated text is classified and the target associated text is displayed,
determining a first category corresponding to the target associated text;
searching the business case matched with the first category from the internal data;
and analyzing the business case.
In order to solve the above technical problem, a second aspect of the present invention provides a guest group identification apparatus based on multivariate data tagging, the apparatus comprising:
the acquisition module is used for respectively acquiring the internal data and the external data to obtain initial information related to a service scene;
the screening module is used for screening out target associated texts in the initial information through a text analysis model;
the labeling processing module is used for extracting effective information in the target associated text by combining automatic text sentiment analysis and a preset rule and performing labeling processing on the effective information to obtain label data;
the dividing module is used for positioning the targets corresponding to the label data and dividing the targets into a plurality of first target passenger groups;
and the analysis and determination module is used for analyzing the preset indexes and the preset index change trends of the first target passenger groups to determine second target passenger groups.
According to a preferred embodiment of the present invention, the analysis determining module includes:
the quantitative analysis module is used for quantitatively analyzing preset indexes of each first target passenger group to determine the discrimination of the corresponding label data;
the tracking analysis module is used for tracking and analyzing the change trend of the preset indexes of each first target guest group to determine the current safety performance trend and the safety performance degree of the corresponding first target guest group;
and the comprehensive screening module is used for screening out second target passenger groups from each first target passenger group by synthesizing the preset indexes, the discrimination, the safety performance degree and the safety performance trend.
According to a preferred embodiment of the invention, the device further comprises:
the acquisition module is used for acquiring target data of each target in each second target guest group;
the evaluation module is used for determining a second index of abnormal behavior of each second target guest group according to the target data;
and the selecting module is used for determining a final target guest group in each second target guest group according to the second index.
According to a preferred embodiment of the invention, the device further comprises:
the monitoring acquisition module is used for monitoring the final target guest group through a preset strategy or a preset model and acquiring service log data of each target in the final target guest group in the service scene;
the sub-determining module is used for determining the number proportion of targets with abnormal behaviors according to the service log data;
and the adjusting module is used for adjusting and optimizing the preset indexes and the change trend of the preset indexes according to the quantity ratio.
According to a preferred embodiment of the present invention, the acquisition module comprises:
the first selection module is used for selecting a target external data channel according to the service scene and the information heat degree;
the sub-acquisition module is used for acquiring text data related to preselected keywords from the target external data channel;
the analysis screening module is used for analyzing the correlation degree of the text data and the service scene and screening target keywords from the preselected keywords according to the correlation degree;
and the sub-acquisition module is used for acquiring external initial information related to the service scene from an external channel according to the target keyword.
According to a preferred embodiment of the invention, the device further comprises:
the first analysis module is used for analyzing the target associated text;
the configuration module is used for configuring corresponding quantization indexes according to the analysis result;
and the monitoring display module is used for monitoring and displaying the change condition of the quantitative index.
According to a preferred embodiment of the invention, the first analysis module comprises:
a sub-classification module for classifying the target associated text,
the sub-classification module is used for determining a first category corresponding to the target associated text;
the sub-searching module is used for searching the business cases matched with the first category from the internal data;
and the sub-analysis module is used for analyzing the business case.
To solve the above technical problem, a third aspect of the present invention provides an electronic device, comprising:
a processor; and
a memory storing computer-executable instructions that, when executed, cause the processor to perform the method of any of the above.
To solve the above technical problems, a fourth aspect of the present invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs that, when executed by a processor, implement the above method.
The method comprises the steps of respectively carrying out multivariate data acquisition on internal data and external data to obtain internal and external initial public opinion information related to a business scene; screening out target associated texts in the initial information through a text analysis model; extracting effective information in the target associated text by combining automatic text sentiment analysis and a preset rule, and performing tagging processing on the effective information to obtain diversified tag data containing internal and external information; positioning targets corresponding to the label data, and dividing the targets into a plurality of first target passenger groups; and determining a second target passenger group by analyzing the preset indexes and the preset index change trends of the first target passenger groups. Wherein: the external information diversified label data obtained through real-time multivariate data acquisition, screening and labeling processing can sense abnormal public sentiment before the product is provided and comprehensively and timely identify the abnormal passenger groups, the internal information diversified label obtained through real-time multivariate data acquisition, screening and labeling processing can comprehensively and timely identify the abnormal passenger groups after the product is provided, therefore, the method and the system realize effective utilization of internal and external information related to the service scene and full link monitoring of abnormal customer groups in the service scene, improve timeliness and accuracy of customer group identification, and guarantee platform data and property safety timely and comprehensively.
Drawings
In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects obtained more clear, the following will describe in detail the embodiments of the present invention with reference to the accompanying drawings. It should be noted, however, that the drawings described below are only illustrations of exemplary embodiments of the invention, from which other embodiments can be derived by those skilled in the art without inventive step.
FIG. 1 is a schematic flow chart of a method for identifying a customer group based on multivariate data tagging according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of tag data obtained by an embodiment of the present invention;
FIG. 3 is a diagram illustrating an embodiment of locating a guest group;
FIG. 4 is a schematic structural framework diagram of a passenger group identification device based on multivariate data tagging according to an embodiment of the present invention;
FIG. 5 is a block diagram of an exemplary embodiment of an electronic device according to the present invention;
FIG. 6 is a diagrammatic representation of one embodiment of a computer-readable medium of the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention may be embodied in many specific forms, and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept to those skilled in the art.
The structures, properties, effects or other characteristics described in a certain embodiment may be combined in any suitable manner in one or more other embodiments, while still complying with the technical idea of the invention.
Referring to fig. 1, fig. 1 is a method for identifying a customer group based on multivariate data tagging according to the present invention, as shown in fig. 1, the method includes:
s1, respectively carrying out multivariate data acquisition on internal data and external data to obtain initial information related to a service scene;
wherein: the internal data refers to data in a database inside the platform, mainly relates to text record data of various aspects of platform services, and is obviously related to service scenes; it may include: records of browsing, applying for, or using a platform product or service, records of product production, shipping, use, settlement, and the like. As shown in fig. 2, different recording data can be configured with different labeling information, and corresponding text recording data can be acquired through the labeling information as internal initial information; such as: and acquiring the text record data of product settlement as internal initial information according to the settlement label. The embodiment can obtain the internal initial information by collecting the internal data.
The external data refers to public data other than the internal data, and may be data disclosed in public platforms such as a public social platform, a public opinion platform matched with a business scene, a specified public website, and the like. The external initial information can be obtained through diversified acquisition of the external data, and the internal initial information and the external initial information form the initial information.
For example, the external data may be collected from each common platform through a distributed data acquisition technique, and then the performing multivariate data collection on the external data includes:
s11, selecting a target external data channel according to the service scene and the information heat;
in this embodiment, in order to pre-screen out information strongly associated with a service scenario, both the service scenario correlation and the information popularity need to be considered, so as to ensure that the clustering of abnormal topics in the information data source is high, and the popularity of topics related to platform services is high. Meanwhile, based on the consideration of user privacy, as shown in fig. 2, one or more platforms which are related to a service scene and have high information heat can be selected from the public platform as a target external data channel according to preset service keywords and topic heat of the platform. Wherein: the common platform may be: the system comprises a public social platform, a public opinion platform matched with a business scene, a designated public website and the like.
S12, collecting text data related to preselected keywords from the target external data channel;
such as: a keyword set may be preconfigured, which includes a plurality of pre-selected keywords that may be missed but are highly relevant to the business scenario, such as: and black products, fraud, network attacks and the like, collecting keyword texts with certain orders of magnitude from a target external data channel according to the preselected keywords as text data related to the preselected keywords.
S13, analyzing the correlation degree of the text data and a service scene, and screening out target keywords from the preselected keywords according to the correlation degree;
illustratively, the similarity between the keyword text and the service scene can be analyzed, the correlation between the keyword text and the service scene can be determined based on the similarity, and in addition, the correlation between the keyword text and the service scene can be calculated based on a Word2Vec model; furthermore, the effectiveness of the relevancy data can be judged by combining with an empirical model, and then the preselected keywords corresponding to the keyword texts with relevancy greater than a threshold value are screened out according to the effective relevancy data to serve as the target keywords, and meanwhile, the mistakenly-hit texts with low relevancy are eliminated from the text data.
And S14, acquiring external initial information related to the service scene from an external channel according to the target keyword.
Wherein: the external channel may include: the system comprises a public social platform, a public opinion platform matched with a business scene, a designated public website, the target external channel and other diversified public information platforms.
For example, as shown in fig. 2, distributed data acquisition technology may be used to perform all-around data acquisition on a plurality of external channels (e.g., various external important public social platforms, abnormal public opinion platforms, etc.), so as to perform real-time or quasi-real-time diversified data acquisition on each external channel. Wherein: in the distributed data acquisition technology, data acquisition programs are installed on a plurality of computers, a queue is shared, repeated data is removed, and the plurality of data acquisition programs do not acquire acquired contents any more, so that joint acquisition is realized.
So far, external initial information related to the business scene is collected from external channels in a diversified manner.
S2, screening out a target associated text in the initial information through a text analysis model;
for example, as shown in fig. 2, a text analysis model may be established by using Natural Language Processing (NLP), and the text analysis model is trained through the key information identified by the guest group, so that the text analysis model can accurately identify the strongly correlated text, thereby screening the strongly correlated target associated text from the initial information.
Furthermore, the target associated text can be subjected to quantitative analysis and visualized display in the step, quantitative analysis results can be clearly displayed to workers, and preset indexes can be directly configured according to a quantitative analysis mechanism. Therefore, after step S2, the method may further comprise:
s21, analyzing the target associated text;
illustratively, this step may include:
s201, classifying the target associated texts,
such as: presetting keywords with different grades, dismantling the target associated text, determining the grade of the keywords hit by the dismantled target associated text, and classifying the target associated texts hit by the keywords with the same grade into one class; and evaluating the target associated texts of the same category and the hit level keywords to determine the accuracy of classification, and automatically storing evaluation data in a database bottom layer data table to facilitate subsequent management and quantitative analysis.
S202, determining a first category corresponding to the target associated text;
after the classification is obtained, the order of magnitude of the target associated texts of each category in the recent period (in the current preset time period) is collected and analyzed, the categories with the order of magnitude larger than the preset number are used as the first category, and the fact that the number of the target associated texts in the first category is large is shown, and the covered information amount is large.
S203, searching the business case matched with the first class from the internal data;
such as: each business case in the internal data has labeling information, and as shown in fig. 2, the target associated text in the first category may be matched with the labeling information to obtain the business case matched with the first category.
And S204, analyzing the service case.
Such as: various indexes of the business case can be analyzed, and a user can configure quantitative indexes according to an analysis result.
S22, configuring corresponding quantization indexes according to analysis results;
illustratively, the quantization indexes of the historical internal cases are configured, such as: product application passage rate, product usage rate, product settlement rate, and the like.
And S23, monitoring and displaying the change condition of the quantitative index.
For example, fig. 2, a one-stop self-service BI tool may be used to visually display the change of the quantitative index.
A simple preset index configuration interface is provided through visual display of the change condition of the quantitative index, a user can configure various quantitative indexes needing to be analyzed and output in the section, the change condition of the quantitative index is displayed, and convenience is brought to configuration of the preset index.
S3, extracting effective information in the target associated text by combining automatic text sentiment analysis and a preset rule, and performing labeling processing on the effective information to obtain label data;
wherein: the automatic text sentiment analysis combines technologies such as natural language processing, text mining and computer languages to extract sentiment information in target associated texts, such as: some key words can be labeled firstly, then the key words are expanded to other new words through different models, an emotion dictionary is generated to carry out emotion polarity classification judgment, emotion information of the target associated text is obtained, and the emotion information can reflect the reliability level of the target associated text information. The preset rule is used for determining the reliability level of the associated text information, and the preset rule may be configured according to experience, for example: some reliable keywords with different grades can be preset, and the reliability grade of the associated text information is determined according to the number and the grade of the hit reliable keywords of the associated text information. And finally, combining the reliability grade obtained by the automatic text emotion analysis with the reliability grade obtained by a preset rule to obtain a final reliability grade of the target associated text, taking effective information in advance according to the final reliability grade, for example, taking the target associated text with the final reliability grade greater than the preset reliability grade as the effective information, and labeling the effective information according to the reliability grade to obtain label data.
Thus, tag data capable of accurately positioning the guest group is obtained.
S4, positioning targets corresponding to the label data, and dividing the targets into a plurality of first target passenger groups;
wherein: the target can be a terminal device for applying, using or settling platform products, and can also be a user account for applying, using or settling platform products. As shown in fig. 3, in this step, the target corresponding to each tag data may be located through the text classification model. The text classification model can classify the text according to the labels through a large amount of text training, and then the target corresponding to each label is obtained. For example, the device text data of the application, use or settlement platform product may be input into a text classification model to obtain targets corresponding to each tag, and the targets corresponding to the same tag are divided into the same first target guest group, so that each tag corresponds to one first target guest group.
And S5, analyzing the preset indexes and the preset index change trends of the first target passenger groups to determine second target passenger groups.
In this embodiment, the preset index is used to determine whether the tag distinction degree of the first target guest group is obvious. The preset index change trend is used for judging whether the current abnormal behavior of the first target passenger group is obvious or not. If the target in the first target guest group is a terminal, the abnormal behavior may be illegal data acquisition or a behavior of illegally acquiring data, and if the target in the first target guest group is a user account, the abnormal behavior may be a behavior of frequently logging in different places or switching an IP. The preset index may be a business data index related to abnormal behavior of the first target guest group, such as: the product utilization rate is the ratio of the target applying for and using the product in the first target customer group to the target applying for the product; product settlement rate, i.e., the proportion of the target used in the first target customer group and settling the product to the target of using the product, and so on. In specific application, one preset index can be configured, or a plurality of preset indexes can be configured, when a plurality of preset indexes are arranged, a corresponding weight ratio can be configured for each preset index, and all the preset indexes are weighted and summed to obtain a final preset index.
Illustratively, this step may include:
s51, quantitatively analyzing preset indexes of all first target passenger groups to determine the discrimination of corresponding label data;
for example, index thresholds corresponding to different partition degrees may be preconfigured, the preset index of each first target guest group is quantitatively analyzed through a statistical tool, the preset index of each first target guest group is compared with the index threshold corresponding to each partition degree, and the partition degree where the preset index of each first target guest group is located is determined, that is, the tag partition degree of each first target guest group.
S52, tracking and analyzing the change trend of the preset indexes of each first target guest group to determine the current safety performance trend and safety performance degree of the corresponding first target guest group;
such as: the change trends of preset indexes in each first target customer group at a plurality of time points within a current preset time period can be analyzed, so that the change trend of the preset indexes is determined, and the current safety performance trend and the safety performance degree are determined according to the change trends, such as: the preset index is continuously increased to correspond to the first safety expression degree, the preset index is increased to correspond to the second safety expression degree after being reduced, the preset index is reduced to correspond to the third safety expression degree after being increased, and the preset index is continuously reduced to correspond to the fourth safety expression degree.
And S53, screening out second target passenger groups from each first target passenger group by integrating the preset indexes, the discrimination, the safety performance degree and the safety performance trend.
When the service is provided for the guest group, under the condition that the data of the targets in the guest group meet the condition, the more stable targets of each item of data meet the system requirements, so that the second target guest group is screened out from the first target guest group, an index threshold value, a discrimination threshold value, a safety expression threshold value and a safety expression trend requirement can be configured, and the guest group of which the preset index is greater than the index threshold value, the discrimination is greater than the discrimination threshold value, the safety expression is greater than the safety expression threshold value and the safety expression trend meets the safety expression trend threshold value interval is screened out from the first target guest group to serve as the second target guest group.
After the second target guest group is screened out, a final target guest group can be further screened out according to the device data of the second target guest group, and the positioning accuracy of the guest group is improved, then the method can further comprise:
s61, acquiring target data of each target in each second target guest group;
if the target in the second target guest group is the terminal device, the target data includes: device model, user selection of published device data, which may include: the location of the device, device user information, device communication information, etc. If the target in the second target guest group is the user account, the target data comprises: account IP address, account login time, etc.
S62, determining a second index of abnormal behavior of each second target guest group according to the target data;
wherein: the second index may be used to predict the abnormal behavior of the target, such as: the data of each target can be input into the abnormal behavior model for analysis, a second index of each target is output, and then the average value of the second indexes of each target is taken as the second index of the second target guest group, so that the probability of abnormal behavior of each second target guest group is determined.
And S63, determining a final target passenger group in each second target passenger group according to the second index.
Such as: and taking the second target passenger group with the second index larger than the preset index as a final target passenger group, thereby accurately positioning the abnormal behavior passenger group.
In addition, the invention can also adjust and optimize the preset index and the threshold interval of the change trend of the preset index according to the service log of the final target customer group, thereby further improving the positioning precision. The method further comprises:
s71, monitoring the final target guest group through a preset strategy or a preset model, and acquiring service log data of each target in the final target guest group in the service scene;
wherein: the preset policy may monitor target behaviors in the final target guest group, such as: the preset strategy can configure a plurality of abnormal behavior judgment rules, and when the real-time behavior of the target hits a certain judgment rule, an alarm is started so that the platform can control the abnormal behavior. The preset model can predict the abnormal behavior of the final target guest group, the probability value of the abnormal behavior of the final target guest group is output after target data of the final target guest group is input into the preset model, and when the probability value is larger than the preset probability, an alarm is started so that the platform can control the abnormal behavior.
Under the condition of no alarm, the service log data of each target in the final target customer group under the service scene can be automatically acquired by the technologies such as point burying and the like.
S72, determining the number proportion of the targets with abnormal behaviors according to the service log data;
in this embodiment, if the target in the first target guest group is a terminal, the abnormal behavior may be an illegal data acquisition behavior or an illegal data acquisition behavior, and if the target in the first target guest group is a user account, the abnormal behavior may be a behavior of frequently logging in or switching an IP at different places. In the step, a data acquisition mode or an IP login mode in the service log data is analyzed, so that the number of targets with abnormal behaviors is determined, and the number of targets with abnormal behaviors is obtained.
And S73, adjusting and optimizing the preset index and the preset index change trend threshold interval according to the number ratio.
In this embodiment, a proportion threshold may be preset, the magnitude of the number proportion and the proportion threshold is compared, and when the number proportion is smaller than the proportion threshold, it indicates that the abnormal behavior of the final target passenger group is not prominent enough, and the preset index for screening the second target passenger group needs to be adjusted, so as to adjust the second target passenger group, and further adjust the final target passenger group.
Illustratively, when the number ratio is smaller than the ratio threshold value, another preset index is used to replace the original preset index, or another at least one preset index is added. When the number of the preset indexes is more than the number ratio, the weight ratios of different preset indexes can be adjusted, attention to the threshold interval of the change trend of the preset indexes is adjusted through adjusting the weight ratio, adjustment of the range requirement of the change trend of the preset indexes is achieved, and then adjustment of the second target passenger group is completed.
Fig. 4 is a passenger group identification device based on multivariate data tagging according to the invention, as shown in fig. 4, the device comprises:
the acquisition module 41 is configured to perform multivariate data acquisition on the internal data and the external data, respectively, to obtain initial information related to a service scene;
the screening module 42 is configured to screen out a target associated text in the initial information through a text analysis model;
the labeling processing module 43 is configured to extract effective information in the target associated text by combining automatic text sentiment analysis and a preset rule, and perform labeling processing on the effective information to obtain label data;
the dividing module 44 is configured to locate a target corresponding to each tag data, and divide each target into a plurality of first target guest groups;
and an analysis and determination module 45, configured to analyze the preset index and the preset index change trend of each first target customer group to determine a second target customer group.
According to a preferred embodiment of the invention, said analysis determination module 45 comprises:
the quantitative analysis module is used for quantitatively analyzing preset indexes of each first target passenger group to determine the discrimination of the corresponding label data;
the tracking analysis module is used for tracking and analyzing the change trend of the preset indexes of each first target guest group to determine the current safety performance trend and the safety performance degree of the corresponding first target guest group;
and the comprehensive screening module is used for screening out second target passenger groups from each first target passenger group by synthesizing the preset indexes, the discrimination, the safety performance degree and the safety performance trend.
Further, the apparatus further comprises:
the acquisition module is used for acquiring target data of each target in each second target guest group;
the evaluation module is used for determining a second index of abnormal behavior of each second target customer group according to the target data;
and the selecting module is used for determining a final target guest group in each second target guest group according to the second index.
Further, the apparatus further comprises:
the monitoring acquisition module is used for monitoring the final target customer group through a preset strategy or a preset model and acquiring service log data of each target in the final target customer group in the service scene;
the sub-determining module is used for determining the number proportion of targets with abnormal behaviors according to the service log data;
and the adjusting module is used for adjusting and optimizing the preset indexes and the change trend of the preset indexes according to the quantity ratio.
Optionally, the acquiring module 41 includes:
the first selection module is used for selecting a target external data channel according to the service scene and the information heat degree;
the sub-acquisition module is used for acquiring text data related to preselected keywords from the target external data channel;
the analysis screening module is used for analyzing the correlation degree of the text data and the service scene and screening target key words from the preselected key words according to the correlation degree;
and the sub-acquisition module is used for acquiring external initial information related to the service scene from an external channel according to the target keyword.
Further, the apparatus further comprises:
the first analysis module is used for analyzing the target associated text;
the configuration module is used for configuring corresponding quantization indexes according to the analysis result;
and the monitoring display module is used for monitoring and displaying the change condition of the quantitative index.
Wherein: the first analysis module comprises:
a sub-classification module for classifying the target associated text,
the sub-classification module is used for determining a first category corresponding to the target associated text;
the sub-searching module is used for searching the business cases matched with the first category from the internal data;
and the sub-analysis module is used for analyzing the business case.
Those skilled in the art will appreciate that the modules in the above-described embodiments of the apparatus may be distributed as described in the apparatus, and may be correspondingly modified and distributed in one or more apparatuses other than the above-described embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Fig. 5 is a block diagram of an exemplary embodiment of an electronic device according to the present invention. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the electronic apparatus 500 of this exemplary embodiment is shown in the form of a general-purpose data processing apparatus. The components of the electronic device 500 may include, but are not limited to: at least one processing unit 510, at least one memory unit 520, a bus 530 connecting different electronic device components (including the memory unit 520 and the processing unit 510), a display unit 540, and the like.
The storage unit 520 stores a computer readable program, which may be a code of a source program or a read-only program. The program may be executed by the processing unit 510 such that the processing unit 510 performs the steps of various embodiments of the present invention. For example, the processing unit 510 may perform the steps as shown in fig. 1.
Bus 530 may be a local bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or any of a variety of bus architectures.
FIG. 6 is a schematic diagram of one computer-readable medium embodiment of the present invention. As shown in fig. 6, the computer program may be stored on one or more computer readable media.
The computer readable medium may be a readable signal medium or a readable storage medium. The computer program, when executed by one or more data processing devices, enables the computer-readable medium to implement the above-described method of the invention, namely: respectively carrying out multivariate data acquisition on the internal data and the external data to obtain initial information related to a service scene; screening out target associated texts in the initial information through a text analysis model; extracting effective information in the target associated text by combining automatic text sentiment analysis and a preset rule, and performing tagging processing on the effective information to obtain tag data; positioning targets corresponding to the label data, and dividing the targets into a plurality of first target passenger groups; and analyzing the preset indexes and the preset index change trends of the first target passenger groups to determine second target passenger groups.
The invention may be implemented as a method, apparatus, electronic device, or computer-readable medium executing a computer program. Some or all of the functions of the present invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP).
While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims (16)

1. A passenger group identification method based on multivariate data tagging, which is characterized by comprising the following steps:
respectively carrying out multivariate data acquisition on the internal data and the external data to obtain initial information related to a service scene;
screening out a target associated text in the initial information through a text analysis model;
extracting effective information in the target associated text by combining automatic text emotion analysis and a preset rule, and performing labeling processing on the effective information to obtain label data;
positioning targets corresponding to the label data, and dividing the targets into a plurality of first target guest groups;
and analyzing the preset indexes and the preset index change trends of the first target passenger groups to determine second target passenger groups.
2. The method of claim 1, wherein analyzing the preset index and the preset index variation trend of each first target guest group to determine a second target guest group comprises:
quantitatively analyzing preset indexes of each first target passenger group to determine the discrimination degree of corresponding label data;
tracking and analyzing the change trend of preset indexes of each first target guest group to determine the current safety performance trend and safety performance degree of the corresponding first target guest group;
and screening out second target passenger groups from the first target passenger groups by integrating the preset indexes, the discrimination, the safety performance and the safety performance trend.
3. The method of claim 2, further comprising:
acquiring target data of each target in each second target guest group;
determining a second index of abnormal behavior of each second target guest group according to the target data;
and determining a final target passenger group in each second target passenger group according to the second index.
4. The method of claim 3, further comprising:
monitoring the final target guest group through a preset strategy or a preset model, and acquiring service log data of each target in the final target guest group under the service scene;
determining the number ratio of the targets with abnormal behaviors according to the service log data;
and adjusting and optimizing the preset indexes and the change trend of the preset indexes according to the quantity ratio.
5. The method of claim 1, wherein the multivariate data collection of external data comprises:
selecting a target external data channel according to the service scene and the information heat;
collecting text data related to preselected keywords from the target external data channel;
analyzing the correlation degree of the text data and a service scene, and screening target keywords from the preselected keywords according to the correlation degree;
and acquiring external initial information related to the service scene from an external channel according to the target keyword.
6. The method of claim 1, wherein after the target associated text in the initial information is filtered out by the text analysis model, the method further comprises:
analyzing the target associated text;
configuring corresponding quantization indexes according to the analysis result;
and monitoring and displaying the change condition of the quantitative index.
7. The method of claim 6, wherein analyzing the target associated text comprises:
classifying the target associated text, and determining a first category corresponding to the target associated text;
searching the business case matched with the first category from the internal data;
and analyzing the business case.
8. A device for identifying a guest group based on multivariate data tagging, the device comprising:
the acquisition module is used for respectively acquiring the internal data and the external data to obtain initial information related to a service scene;
the screening module is used for screening out target associated texts in the initial information through a text analysis model;
the labeling processing module is used for extracting effective information in the target associated text by combining automatic text emotion analysis and a preset rule, and performing labeling processing on the effective information to obtain label data;
the dividing module is used for positioning the targets corresponding to the label data and dividing the targets into a plurality of first target passenger groups;
and the analysis determining module is used for analyzing the preset indexes and the preset index change trends of the first target passenger groups to determine second target passenger groups.
9. The apparatus of claim 8, wherein the analysis determination module comprises:
the quantitative analysis module is used for quantitatively analyzing preset indexes of each first target passenger group to determine the discrimination of the corresponding label data;
the tracking analysis module is used for tracking and analyzing the change trend of the preset indexes of each first target guest group to determine the current safety performance trend and the safety performance degree of the corresponding first target guest group;
and the comprehensive screening module is used for screening the second target passenger groups from the first target passenger groups by synthesizing the preset indexes, the discrimination, the safety performance and the safety performance trend.
10. The apparatus of claim 9, further comprising:
the acquisition module is used for acquiring target data of each target in each second target guest group;
the evaluation module is used for determining a second index of abnormal behavior of each second target guest group according to the target data;
and the selecting module is used for determining a final target guest group in each second target guest group according to the second index.
11. The apparatus of claim 10, further comprising:
the monitoring acquisition module is used for monitoring the final target guest group through a preset strategy or a preset model and acquiring service log data of each target in the final target guest group in the service scene;
the sub-determining module is used for determining the number proportion of targets with abnormal behaviors according to the service log data;
and the adjusting module is used for adjusting and optimizing the preset indexes and the change trend of the preset indexes according to the quantity ratio.
12. The apparatus of claim 8, wherein the acquisition module comprises:
the first selection module is used for selecting a target external data channel according to the service scene and the information heat;
the sub-acquisition module is used for acquiring text data related to preselected keywords from the target external data channel;
the analysis screening module is used for analyzing the correlation degree of the text data and the service scene and screening target key words from the preselected key words according to the correlation degree;
and the sub-acquisition module is used for acquiring external initial information related to the service scene from an external channel according to the target keyword.
13. The apparatus of claim 8, further comprising:
the first analysis module is used for analyzing the target associated text;
the configuration module is used for configuring corresponding quantization indexes according to the analysis result;
and the monitoring display module is used for monitoring and displaying the change condition of the quantitative index.
14. The apparatus of claim 13, wherein the first analysis module comprises:
a sub-classification module for classifying the target associated text,
the sub-classification module is used for determining a first category corresponding to the target associated text;
the sub-searching module is used for searching the business case matched with the first type from the internal data;
and the sub-analysis module is used for analyzing the business case.
15. An electronic device, comprising:
a processor; and
a memory storing computer-executable instructions that, when executed, cause the processor to perform the method of any of claims 1 to 7.
16. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-7.
CN202211263248.8A 2022-10-14 2022-10-14 Passenger group identification method and device based on multivariate data tagging and electronic equipment Pending CN115760149A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211263248.8A CN115760149A (en) 2022-10-14 2022-10-14 Passenger group identification method and device based on multivariate data tagging and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211263248.8A CN115760149A (en) 2022-10-14 2022-10-14 Passenger group identification method and device based on multivariate data tagging and electronic equipment

Publications (1)

Publication Number Publication Date
CN115760149A true CN115760149A (en) 2023-03-07

Family

ID=85351538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211263248.8A Pending CN115760149A (en) 2022-10-14 2022-10-14 Passenger group identification method and device based on multivariate data tagging and electronic equipment

Country Status (1)

Country Link
CN (1) CN115760149A (en)

Similar Documents

Publication Publication Date Title
CN110704572B (en) Suspected illegal fundraising risk early warning method, device, equipment and storage medium
CN103294592B (en) User instrument is utilized to automatically analyze the method and system of the defect in its service offering alternately
CN112348520A (en) XGboost-based risk assessment method and device and electronic equipment
CN112508694B (en) Method and device for processing resource limit application by server and electronic equipment
CN110689438A (en) Enterprise financial risk scoring method and device, computer equipment and storage medium
CN110738527A (en) feature importance ranking method, device, equipment and storage medium
CN106469383A (en) The detection method of advertisement putting quality and device
CN106991175A (en) A kind of customer information method for digging, device, equipment and storage medium
CN111179051A (en) Financial target customer determination method and device and electronic equipment
Matthies et al. Computer-aided text analysis of corporate disclosures-demonstration and evaluation of two approaches
Beltzung et al. Real-time detection of fake-shops through machine learning
CN113360566A (en) Information content monitoring method and system
CN113762973A (en) Data processing method and device, computer readable medium and electronic equipment
CN117391440A (en) Enterprise information reconnaissance platform and method
US20060248096A1 (en) Early detection and warning systems and methods
CN114692593A (en) Network information safety monitoring and early warning method
Maçãs et al. ATOVis–A visualisation tool for the detection of financial fraud
CN111784360B (en) Anti-fraud prediction method and system based on network link backtracking
KR20220102745A (en) System for recommending domestic and global supply chain based on patent big data and check of risk of supply chain
US20100042446A1 (en) Systems and methods for providing core property review
CN116308416A (en) Empty shell enterprise identification method and system
CN116563028A (en) AI-based report data verification method, system and storage medium
CN116228402A (en) Financial credit investigation feature warehouse technical support system
CN115760149A (en) Passenger group identification method and device based on multivariate data tagging and electronic equipment
CN114066631A (en) Anti-money laundering data monitoring method and system, storage medium and intelligent terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: Room 1118, No.4, Lane 800, Tongpu Road, Putuo District, Shanghai 200062

Applicant after: SHANGHAI QIYU INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 201500 room a1-5962, 58 Fumin Branch Road, Hengsha Township, Chongming District, Shanghai (Shanghai Hengtai Economic Development Zone)

Applicant before: SHANGHAI QIYU INFORMATION TECHNOLOGY Co.,Ltd.

Country or region before: China