CN111078883A - Risk index analysis method and device, electronic equipment and storage medium - Google Patents

Risk index analysis method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111078883A
CN111078883A CN201911283667.6A CN201911283667A CN111078883A CN 111078883 A CN111078883 A CN 111078883A CN 201911283667 A CN201911283667 A CN 201911283667A CN 111078883 A CN111078883 A CN 111078883A
Authority
CN
China
Prior art keywords
target
word
tendency
consignment information
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911283667.6A
Other languages
Chinese (zh)
Inventor
赵力元
陈秀坤
高古明
张昭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201911283667.6A priority Critical patent/CN111078883A/en
Publication of CN111078883A publication Critical patent/CN111078883A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

According to the risk index analysis method, the risk index analysis device, the electronic equipment and the storage medium, at least one target feature word is extracted from target delivery information according to a text classification model, the total tendency value of the target delivery information to which the target feature word belongs is calculated according to the tendency value of the target feature word obtained through calculation, and therefore the risk index of the target delivery information is analyzed according to the total tendency value. The risk index analysis method described by the method can monitor a large amount of consignment information and analyze the risk index of the consignment information, and the efficiency of monitoring the consignment information and analyzing the risk index is improved.

Description

Risk index analysis method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of text analysis, in particular to a danger index analysis method and device, electronic equipment and a storage medium.
Background
In recent years, with the rapid development of market economy, the consignment business is growing at a high speed, and the supervision on high-risk articles (such as medicines) in the consignment process inside each logistics enterprise is mostly realized through manual supervision. For example, if the recipient address or the mailing address includes sensitive words such as "face-lift", "reshaping", etc., the recipient corresponding to the recipient address or the mailing address corresponding to the mailing address is marked as an important attention object, and then the recipient address or the mailing address is further judged according to manual experience.
The method needs to rely on manual experience for judgment, cannot monitor batch data in real time, and is low in efficiency.
Disclosure of Invention
In view of the above, the present application aims to provide a risk index analysis method, apparatus, electronic device and storage medium.
In a first aspect, an embodiment provides a risk index analysis method applied to an electronic device, where the method includes:
screening target consignment information from at least one consignment information based on a pre-established feature word library;
extracting at least one target feature word from the target consignment information according to a pre-trained text classification model;
calculating the tendency value of each target characteristic word;
calculating a total tendency numerical value of target delivery information to which the target characteristic words belong according to the tendency numerical value of each target characteristic word;
and analyzing a danger index corresponding to the target consignment information according to the total tendency numerical value.
In an optional embodiment, the method further comprises a step of establishing a feature word library, which includes:
acquiring a plurality of set characteristic words from a training text, and labeling the tendency numerical value of each set characteristic word;
and establishing a feature word library according to the plurality of set feature words and the tendency numerical value of each set feature word.
In an alternative embodiment, the method further comprises the step of training a text classification model, the step comprising:
and training a preset model based on the feature word library to obtain a text classification model.
In an optional embodiment, the screening of the target consignment information from the at least one consignment information based on a pre-established feature thesaurus comprises:
and comparing at least one piece of consignment information with each set characteristic word in the characteristic word bank, and screening target consignment information comprising the set characteristic words from at least one piece of consignment information.
In an alternative embodiment, calculating the tendency value of each target feature word includes:
matching each target characteristic word with each set characteristic word in the characteristic word bank, and judging whether each target characteristic word can be matched with the corresponding set characteristic word in the characteristic word bank;
if the target feature words can be matched with the target feature words, assigning tendency values of the set feature words which can be matched with the target feature words to the target feature words;
if the matching fails, the trend numerical value is marked for the target characteristic word which can not be matched with the corresponding set characteristic word.
In an optional implementation manner, calculating a total tendency value of the target forwarding information to which the target feature word belongs according to the tendency value of each target feature word includes:
and calculating the total tendency value of the target consignment information by adopting an n-gram algorithm based on the tendency value of the target characteristic word.
In an optional embodiment, a threshold range of tendency values corresponding to different risk indexes is preset in the electronic device, and analyzing the risk index corresponding to the target forwarding information according to the total tendency value includes:
and judging the danger index corresponding to the target consignment information according to the threshold range to which the total tendency value of the target consignment information belongs.
In a second aspect, embodiments provide a risk index analysis device, the device comprising:
the screening module is used for screening target consignment information from at least one consignment information based on a pre-established feature word library;
the target feature word extraction module is used for extracting at least one target feature word from the target consignment information according to a pre-trained text classification model;
the first calculation module is used for calculating the tendency value of each target characteristic word;
the second calculation module is used for calculating the total tendency value of the target consignment information to which the target characteristic word belongs according to the tendency value of each target characteristic word;
and the analysis module is used for analyzing the danger index corresponding to the target consignment information according to the total tendency numerical value.
In a third aspect, an embodiment provides an electronic device, including a processor and a non-volatile memory storing computer instructions, where the computer instructions, when executed by the processor, cause the electronic device to perform the risk index analysis method according to any one of the foregoing embodiments.
In a fourth aspect, an embodiment provides a storage medium, in which a computer program is stored, and the computer program is executed to implement the risk index analysis method according to any one of the foregoing embodiments.
The beneficial effect of this application:
according to the risk index analysis method, the risk index analysis device, the electronic equipment and the storage medium, at least one target feature word is extracted from target delivery information according to a text classification model, the total tendency value of the target delivery information to which the target feature word belongs is calculated according to the tendency value of the target feature word obtained through calculation, and therefore the risk index of the target delivery information is analyzed according to the total tendency value. The risk index analysis method described by the method can monitor a large amount of consignment information and analyze the risk index of the consignment information, and the efficiency of monitoring the consignment information and analyzing the risk index is improved.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method for risk index analysis according to an embodiment of the present disclosure;
FIG. 3 is a second flowchart of a risk index analysis method according to an embodiment of the present application;
fig. 4 is a flowchart illustrating sub-steps of step S208 in fig. 3 according to an embodiment of the present disclosure;
fig. 5 is a flowchart illustrating sub-steps of step S209 in fig. 3 according to an embodiment of the present disclosure;
fig. 6 is a flowchart illustrating sub-steps of step S230 in fig. 1 according to an embodiment of the present disclosure;
fig. 7 is a functional block diagram of a risk index analyzing apparatus according to an embodiment of the present application.
Description of the main element symbols: 100-an electronic device; 110-a hazard index analysis device; 120-a memory; 130-a processor; 1101-a screening module; 1102-a target feature word extraction module; 1103 — a first calculation module; 1104-a second calculation module; 1105-analysis module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present disclosure. The electronic device 100 includes a processor 130, a memory 120, and a risk index analyzing apparatus 110, wherein the memory 120 and the elements of the processor 130 are directly or indirectly electrically connected to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The danger index analyzing apparatus 110 includes at least one software function module which can be stored in the memory 120 in the form of software or firmware (firmware) or is fixed in an Operating System (OS) of the electronic device 100. The processor 130 is used for executing executable modules stored in the memory 120, such as software functional modules and computer programs included in the risk index analyzing apparatus 110. The electronic device 100 may be, but is not limited to, a wearable device, a smartphone, a tablet, a personal digital assistant, and the like.
The Memory 120 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 120 is used for storing a program, and the processor 130 executes the program after receiving the execution instruction.
The processor 130 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 2, fig. 2 is a flowchart of a method for analyzing a risk index according to an embodiment of the present application, where the method is applied to the electronic device 100 shown in fig. 1, and the method includes the following steps:
step S210, the target consignment information is screened out from at least one consignment information based on a pre-established feature word library.
Step S220, at least one target feature word is extracted from the target consignment information according to a pre-trained text classification model.
In step S230, a tendency value of each target feature word is calculated.
Step S240, calculating a total tendency value of the target forwarding information to which the target feature word belongs according to the tendency value of each target feature word.
And step S250, analyzing a danger index corresponding to the target consignment information according to the total tendency numerical value.
With the establishment of large data platforms in provinces and cities of China, the data acquisition and access capabilities are greatly improved, and the average data of consignment information acquired in each region is up to hundreds of GB. When the risk index analysis is performed on the forwarding information, if all the forwarding information is analyzed, the amount of data calculation is huge, which will affect the operation speed of the electronic device 100. Therefore, in order to reduce the amount of data calculation, first, the target forwarding information is screened from a large amount of forwarding information, and then the risk index of the screened target forwarding information is analyzed.
When analyzing the risk index of the target delivery information, at least one target feature word is first screened out from the target delivery information according to a pre-trained text classification model, where the target feature word may be a word such as "plastic", "beauty", "surgery", "medical", "physical therapy", "foot path", and the delivery information to which the word belongs generally deals with a medicine. Therefore, it is necessary to extract such target feature words from the target posting information.
Since the risks of the drugs or other articles for the information transaction are different for different target feature words, the risks of the drugs for the information transaction corresponding to the target feature words, such as "hospital" or "surgery", are generally higher than the risks of the drugs for the information transaction corresponding to the target feature words, such as "foot way" or "physical therapy".
Therefore, it is possible to calculate the tendency value of each target feature word, and calculate the total tendency value of the target forwarding information according to the tendency value of each target feature word, that is, each target forwarding information has a corresponding total tendency value, for example, 50, 80, or 100.
When one or more target characteristic words exist in the target delivery information, the total tendency value of the target delivery information to which the target characteristic words belong can be calculated through a proper algorithm according to the one or more target characteristic words.
And analyzing the danger index corresponding to the target consignment information according to the total tendency value corresponding to each target consignment information. Wherein, the danger index can be low danger, suspected danger, danger or high danger and the like.
The risk index analysis method described in the steps can monitor a large amount of consignment information and analyze the risk index of the consignment information, and efficiency and accuracy of supervision of the consignment information and analysis of the risk index are improved.
After the risk index of each target consignment information is analyzed, the target consignment information can be monitored to different degrees according to the gears (low-risk, suspected-risk, dangerous or high-risk) corresponding to the risk index of each target consignment information, for example, the target consignment information of the dangerous and high-risk gears is monitored in a key mode, and the monitoring strength of the target consignment information of the low-risk or suspected-risk gears can be reduced appropriately.
The method can be applied to help a supervisor to quickly and accurately identify dangerous express with a large suspicion, such as the express of suspected drug transportation, in a large amount of complex consignment information, and is convenient for supervision.
Referring to fig. 3, fig. 3 is a second flowchart of a risk index analysis method according to an embodiment of the present application. In this embodiment, the method for risk index analysis further includes step S208 of establishing a feature word library.
Specifically, referring to fig. 4, fig. 4 is a flowchart illustrating sub-steps of step S208 in fig. 3 according to an embodiment of the present disclosure. In the present embodiment, step S208 includes the following sub-steps:
and a substep S2081, obtaining a plurality of set characteristic words from the training text, and labeling the tendency numerical value of each set characteristic word.
And a substep S2082, establishing a feature word library according to the plurality of set feature words and the tendency numerical value of each set feature word.
In the sub-steps, in order to establish the feature word library, a plurality of setting feature words are firstly acquired from a large amount of training texts (such as medical insurance data tables, payment units or logistic consignee addresses), wherein the setting feature words may be "shaping", "beautifying", "surgery", "medical", "physical therapy", "foot path", "modeling", "color cosmetics", "hair salon", "hair art", and the like.
After a plurality of setting characteristic words are obtained, each setting characteristic word needs to be labeled with a tendency value, and the higher the tendency value is, the higher the risk index of the setting characteristic word is.
For example, when monitoring of the delivery information of suspected drug transportation is required, since the addresses of the setting feature words such as "hospital", "medical" and the like may be more likely to be suspected of drug transportation than the other feature words, the tendency values of the setting feature words such as "hospital", "surgery", "medical" and the like are empirically labeled to be between 80 and 90, the tendency values of the setting feature words such as "beauty", "shaping" and the like are labeled to be between 50 and 80, the tendency values of the setting feature words such as "salon", "hair salon", "shape", "and the like are labeled to be between 20 and 50, and the tendency values of the setting feature words such as" physical therapy "," foot path "and the like are labeled to be between 0 and 20.
Of course, it should be understood that the above is merely an illustration of the tendency value of each setting feature word, and in different application scenarios, the tendency value of each setting feature word may be other values, and the tendency value of each setting feature word is not specifically limited herein.
After the tendency numerical values of the set characteristic words are labeled, the set characteristic words and the tendency numerical values corresponding to the set characteristic words are stored in a database to construct a characteristic word library. Therefore, the feature word library includes not only a plurality of set feature words, but also tendency values of the set feature words.
After the feature lexicon is constructed, the risk index of the target forwarding information can be analyzed by referring to the risk index analysis method described in steps S210 to S250.
Optionally, in this embodiment, in step S210, the step of screening the target forwarding information from the at least one forwarding information based on a pre-established feature lexicon specifically includes:
and comparing at least one piece of consignment information with each set characteristic word in the characteristic word bank, and screening target consignment information comprising the set characteristic words from at least one piece of consignment information.
Due to the rapid development of the express industry, the daily express quantity of each city is very large, and the data volume of the consignment information formed by a large number of express quantities is also very large, even up to hundreds of GB. If all the forwarding information is supervised or risk index analysis is performed, the calculation amount of data is huge, and if the huge calculation amount is to be completed, expensive hardware equipment needs to be equipped, so that the supervision cost is high.
In order to reduce the data computation amount of the electronic device 100, the posted information that does not need to be monitored or risk index analysis may be excluded.
For example, the forwarding information may include forwarding information to a general residential address or addresses to places such as hospitals, beauty parlors, and salons, and generally, deliveries to places such as hospitals, beauty parlors, and salons are very likely to involve transportation of drugs or other high-risk articles, while deliveries to a general residential address generally do not involve transportation of drugs or other high-risk articles, and therefore, the forwarding information corresponding to such a general residential address needs to be excluded.
Therefore, it is necessary to screen target posting information including terms such as "hospital", "beauty parlor" and "salon" from the posting information, and perform risk index analysis on the screened target posting information, thereby greatly reducing the amount of data calculation.
The pre-established feature word library comprises a large number of set feature words, and the set feature words are just key words included in the consignment information which needs to be supervised and analyzed by the risk index. Therefore, the target consignment information can be screened out from the consignment information based on the pre-established feature word library. Specifically, it may be determined whether the posting information includes each set feature word in the feature word library, so as to filter the target posting information. Wherein, the target consignment information comprises at least one set characteristic word.
By the method, the posting information which does not contain the set characteristic words and contains at least one set characteristic word can be distinguished, and only the target posting information containing the set characteristic words can be operated when the danger index analysis is carried out, so that the data operation amount is reduced, the cost increase caused by purchasing hardware equipment can be avoided, the operation speed can be increased, and the efficiency of the danger index analysis is improved.
Referring to fig. 3, in the present embodiment, the method for analyzing a risk index further includes a step S209 of training a text classification model.
Specifically, referring to fig. 5, fig. 5 is a flowchart illustrating a sub-step of step S209 in fig. 3 according to an embodiment of the present disclosure. In the present embodiment, step S209 includes:
and substep S2091, training a preset model based on the feature word library to obtain a text classification model.
In the above steps, in order to obtain the text classification model, a preset model needs to be trained through a large amount of training data to obtain a desired text classification model. The training data may be a feature word library including a plurality of set feature words and tendency values corresponding to the plurality of set feature words.
The training mode can be an iterative training method, the difference degree between the output of the preset model and the expected output is calculated through a loss function, the model parameter of the preset model is adjusted according to the difference degree, and the difference degree between the output of the preset model after the model parameter is modified and the expected output is calculated through the loss function again. And terminating the training by judging whether the difference is convergent or judging whether the iteration times reach the preset times, and if the difference is convergent or the iteration times reach the preset times, obtaining the trained text classification model. The text classification model can extract one or more target feature words from the target consignment information.
Alternatively, in the present embodiment, the number of iterations may be empirically set to 1000 or 3000. Of course, other values may be set, and the number of iterations is not particularly limited, and may be set by a person skilled in the art as needed.
Alternatively, in this embodiment, the predetermined model may be a Conditional Random Field (CRF) model, which is a typical discriminant model proposed by Lafferty. The CRF model is firstly proposed aiming at sequence data analysis, models a target sequence on the basis of an observation sequence, mainly solves the problem of serialization labeling, and is a statistical model for marking and segmenting serialization data. The CRF model is mainly used for part-of-speech tagging, word segmentation and named entity recognition.
After the CRF model is trained on the basis of the feature word bank by adopting the method, a trained text classification model is obtained, and the text classification model can extract the target feature words from the target consignment information. The target feature words may be words with similar meanings or semantics, such as "shaping", "beauty", "surgery", "medical", "physiotherapy", "foot path", "modeling", "makeup", "hair salon", "hair art", and the like.
It is understood that the CRF model is only an example of the preset model in this embodiment, and in other embodiments of this embodiment, the preset model may also be other models that can be used to extract words, where the preset model is not specifically limited, and a person skilled in the art may select different preset models according to different application scenarios.
Optionally, in this embodiment, after at least one target feature word is extracted from the target posting information according to the trained text classification model, a tendency value of each target feature word needs to be calculated.
Optionally, referring to fig. 6, fig. 6 is a flowchart illustrating sub-steps of step S230 in fig. 1 according to an embodiment of the present disclosure. In the present embodiment, step S230 includes the following sub-steps:
step S2301, matching each target feature word with each set feature word in the feature word library, and determining whether each target feature word can be matched with a corresponding set feature word in the feature word library.
And step S2302, if the matching is available, assigning the tendency value of the set characteristic word which can be matched with the target characteristic word to the target characteristic word.
In step S2303, if the matching is not possible, a trend numerical value is assigned to the target feature word that cannot be matched to the corresponding set feature word.
In this embodiment, after the target feature words are extracted from each target posting information by the text classification model, the tendency value of each target feature word needs to be calculated, and the text classification model may extract a plurality of target feature words, and the extracted plurality of target feature words may be the same as or different from the set feature words in the feature word library. Therefore, when calculating the tendency value of the target feature word, it is necessary to determine whether the target feature word matches with the set feature word in the feature word library.
If the target feature word can be matched, namely when the target feature word is one of the plurality of set feature words, directly assigning the tendency value of the set feature word to the target feature word.
If the target feature word is not matched, namely when the target feature word is not any one of the plurality of set feature words, a tendency numerical value is marked for the target feature word.
Alternatively, after a trend numerical value is labeled to a target feature word which cannot be matched with the set feature word, the target feature word can be stored in the feature word bank as the set feature word. After the target feature words are subsequently extracted again, assignment can be directly carried out on the target feature words according to tendency numerical values prestored in the feature word library without marking again.
For example, when the text classification model extracts only one target feature word "beauty" from the mail message, if the target feature word "beauty" can be matched with the set feature word "beauty" in the feature word library, the tendency numerical value of the target feature word "beauty" is the same as the tendency numerical value of the matched set feature word "beauty". If the target feature word 'beauty' cannot be matched with the set feature word in the feature word library, a new trend numerical value is labeled for the target feature word 'beauty'.
When the text classification model extracts three target feature words 'beauty', 'shaping', 'hospital' from the posting information, if two of the three target feature words 'shaping', 'hospital' can be matched with the set feature words 'shaping', 'hospital' in the feature word library, then the tendency numerical values of the set feature words 'shaping', 'hospital' pre-stored in the feature word library are respectively assigned to the target feature words 'shaping', 'hospital'. If one target characteristic word 'beauty treatment' cannot be matched with the set characteristic word in the characteristic word bank, a new tendency numerical value is labeled for the target characteristic word 'beauty treatment'.
After the target feature word which cannot be matched with the set feature word is labeled with the tendency numerical value for 'beauty treatment', the target feature word and the corresponding tendency numerical value can be stored in the feature word library, and assignment can be directly carried out on the target feature word in the subsequent calculation process of the target feature word without labeling the tendency numerical value again, so that the data calculation amount can be reduced to a certain extent, the calculation time is reduced, and the risk index analysis efficiency is improved.
After the tendency value of each target feature word is calculated according to the above method, the total tendency value of the target delivery information to which the target feature word belongs is calculated according to the tendency value of each target feature word.
Specifically, in step S240, calculating a total tendency value of the target forwarding information to which the target feature word belongs according to the tendency value of each target feature word, including:
and calculating the total tendency value of the target consignment information by adopting an n-gram algorithm based on the tendency value of the target characteristic word.
In this embodiment, the text classification model can extract one or more target feature words from the target forwarding information, and each target feature word has a respective tendency value, so that the total tendency value of the target forwarding information to which the target feature word belongs needs to be calculated based on each target feature word.
Alternatively, in this embodiment, the total tendency value of the target posted information may be calculated by an n-gram algorithm.
The n-gram algorithm is a common Language Model in large vocabulary continuous speech recognition, and for Chinese, we refer to the Language Model of Chinese (CLM). The Chinese language model can realize automatic conversion to Chinese characters by using collocation information between adjacent words in the context.
The specific role of the n-gram algorithm is also different in different application scenarios, for example, in the embodiment, the n-gram algorithm is used for calculating the total tendency value of the target consignment information according to the tendency value of each target feature word.
Optionally, in addition to calculating the total tendency value of the target delivery information through the n-gram algorithm, the total tendency value of the target delivery information to which the target feature words belong may be calculated according to the frequency of appearance of the target feature words and the summation of a plurality of target feature words.
For example, when the text classification model extracts only one target feature word "hospital" from the target delivery information, the tendency value of the target feature word "hospital" is the total tendency value of the target delivery information.
When the text classification model extracts a plurality of target feature words from the target delivery information, for example, "beauty", "shaping", and "hospital", if the tendency values of "beauty", "shaping", and "hospital" are 50, 55, and 85, respectively, the total tendency value of the target delivery information to which "beauty", "shaping", and "hospital" belong may be obtained by adding the respective tendency values, that is, 50+55+85 is 190.
When the text classification model extracts a plurality of target feature words, such as "beauty", "shaping", "hospital", "beauty", from the target forwarding information, that is, when one of the target feature words appears multiple times, the total tendency value of the target forwarding information of the target feature words may be the tendency value of the target feature words appearing multiple times, such as the tendency value of "beauty" appearing multiple times in this example.
Optionally, in this embodiment, a tendency value corresponding to a target feature word with a highest tendency value among the plurality of target feature words may also be selected as the total tendency value of the target forwarding information.
For example, when the text classification model extracts a plurality of target feature words from the target forwarding information, such as "beauty", "shaping", and "hospital", if the tendency values of "beauty", "shaping", and "hospital" are 50, 55, and 85, respectively, the total tendency value of the belonging target forwarding information may be 85 because 85 is the largest.
After the total tendency value of the target forwarding information is calculated according to the above method, the risk index corresponding to the target forwarding information needs to be analyzed according to the total tendency value.
Specifically, in this embodiment, step S250 includes: and judging the danger index corresponding to the target consignment information according to the threshold range to which the total tendency value of the target consignment information belongs.
In this embodiment, threshold ranges of the tendency values corresponding to different risk indexes are preset in the electronic device 100, for example, when the threshold range of the tendency value is (0, 10), the corresponding risk index may be low risk, when the threshold range of the tendency value is (10, 50), the corresponding risk index may be suspected risk, when the threshold range of the tendency value is (50, 80), the corresponding risk index may be risk, and when the threshold range of the tendency value is greater than 80, the risk index may be high risk.
Of course, it is understood that the threshold range of the tendency value is merely an example, and in other embodiments of the present embodiment, the threshold range of the tendency value may also be other ranges, and is not specifically limited herein.
When the risk index of each target consignment information is analyzed, the threshold range where the total tendency value of the target consignment information is located is judged.
For example, when the total tendency value of the target forwarding information is within the range of (0, 10), it indicates that the risk index of the target forwarding information is low risk, i.e., the current forwarding is low risk, when the total tendency value of the target forwarding information is within the range of (10, 50), it indicates that the risk index of the target forwarding information is suspected risk, i.e., the current forwarding is suspected risk, when the total tendency value of the target forwarding information is within the range of (50, 80), it indicates that the risk index of the target forwarding information is risk, i.e., the current forwarding is dangerous, and when the total tendency value of the target forwarding information is greater than 80, it indicates that the risk index of the target forwarding information is high risk, i.e., the current forwarding is high risk.
In summary, the risk index analysis method described in the above steps can monitor a large amount of consignment information and analyze the risk index of the consignment information, thereby improving the efficiency and accuracy of the consignment information supervision and the risk index analysis. After the risk index of each target consignment information is analyzed, the target consignment information can be monitored to different degrees according to the gear (low risk, suspected risk, danger or high risk) corresponding to the risk index of each target consignment information.
Optionally, referring to fig. 7, fig. 7 is a functional block diagram of a risk index analyzing apparatus 110 according to an embodiment of the present disclosure. The device is applied to the electronic equipment 100, and comprises:
the screening module 1101 is configured to screen the target forwarding information from the at least one forwarding information based on a pre-established feature lexicon.
And a target feature word extraction module 1102, configured to extract at least one target feature word from the target posting information according to a pre-trained text classification model.
A first calculating module 1103, configured to calculate a tendency value of each target feature word.
A second calculating module 1104, configured to calculate, according to the tendency value of each target feature word, a total tendency value of the target forwarding information to which the target feature word belongs.
An analyzing module 1105, configured to analyze a risk index corresponding to the target forwarding information according to the total tendency value.
The risk index analyzing apparatus 110 provided in the embodiment of the present application may be specific hardware on the electronic device 100, or software or firmware installed on the electronic device 100. The device provided by the embodiment of the present application has the same implementation principle and technical effect as the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments where no part of the device embodiments is mentioned. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The present application further provides an electronic device 100, including a processor 130 and a non-volatile memory 120 storing computer instructions, where when the computer instructions are executed by the processor 130, the electronic device 100 executes the risk index analysis method described in the foregoing embodiments, and specific implementation methods may refer to corresponding processes in the foregoing method embodiments, and are not described herein again.
The present application further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program is executed, the method for analyzing a risk index is implemented.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A danger index analyzing method applied to an electronic device includes:
screening target consignment information from at least one consignment information based on a pre-established feature word library;
extracting at least one target feature word from the target consignment information according to a pre-trained text classification model;
calculating the tendency value of each target characteristic word;
calculating a total tendency numerical value of target delivery information to which the target characteristic words belong according to the tendency numerical value of each target characteristic word;
and analyzing a danger index corresponding to the target consignment information according to the total tendency numerical value.
2. The method of claim 1, further comprising the step of creating a library of feature words, comprising:
acquiring a plurality of set characteristic words from a training text, and labeling the tendency numerical value of each set characteristic word;
and establishing a feature word library according to the plurality of set feature words and the tendency numerical value of each set feature word.
3. The method of claim 2, further comprising the step of training a text classification model, comprising:
and training a preset model based on the feature word library to obtain a text classification model.
4. The method of claim 2, wherein screening the target consignment information from the at least one consignment information based on a pre-established library of feature words comprises:
and comparing at least one piece of consignment information with each set characteristic word in the characteristic word bank, and screening target consignment information comprising the set characteristic words from at least one piece of consignment information.
5. The method according to claim 4, wherein calculating the tendency value of each target feature word comprises:
matching each target characteristic word with each set characteristic word in the characteristic word bank, and judging whether each target characteristic word can be matched with the corresponding set characteristic word in the characteristic word bank;
if the target feature words can be matched with the target feature words, assigning tendency values of the set feature words which can be matched with the target feature words to the target feature words;
if the matching fails, the trend numerical value is marked for the target characteristic word which can not be matched with the corresponding set characteristic word.
6. The method according to claim 1, wherein calculating a total tendency value of the target forwarding information to which the target feature word belongs according to the tendency value of each target feature word comprises:
and calculating the total tendency value of the target consignment information by adopting an n-gram algorithm based on the tendency value of the target characteristic word.
7. The method according to claim 1, wherein a threshold range of liability values corresponding to different risk indexes is preset in the electronic device, and the analyzing the risk index corresponding to the target forwarding information according to the total liability value comprises:
and judging the danger index corresponding to the target consignment information according to the threshold range to which the total tendency value of the target consignment information belongs.
8. A risk index analysis device, the device comprising:
the screening module is used for screening target consignment information from at least one consignment information based on a pre-established feature word library;
the target feature word extraction module is used for extracting at least one target feature word from the target consignment information according to a pre-trained text classification model;
the first calculation module is used for calculating the tendency value of each target characteristic word;
the second calculation module is used for calculating the total tendency value of the target consignment information to which the target characteristic word belongs according to the tendency value of each target characteristic word;
and the analysis module is used for analyzing the danger index corresponding to the target consignment information according to the total tendency numerical value.
9. An electronic device comprising a processor and a non-volatile memory having stored thereon computer instructions, which when executed by the processor, perform the risk index analysis method of any one of claims 1-7.
10. A storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when executed, implements the risk index analyzing method according to any one of claims 1 to 7.
CN201911283667.6A 2019-12-13 2019-12-13 Risk index analysis method and device, electronic equipment and storage medium Pending CN111078883A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911283667.6A CN111078883A (en) 2019-12-13 2019-12-13 Risk index analysis method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911283667.6A CN111078883A (en) 2019-12-13 2019-12-13 Risk index analysis method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111078883A true CN111078883A (en) 2020-04-28

Family

ID=70314429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911283667.6A Pending CN111078883A (en) 2019-12-13 2019-12-13 Risk index analysis method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111078883A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112150251A (en) * 2020-10-09 2020-12-29 北京明朝万达科技股份有限公司 Article name management method and device
CN114116988A (en) * 2022-01-27 2022-03-01 国家邮政局邮政业安全中心 Consignment identification method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778225A (en) * 2014-01-23 2014-05-07 北京奇虎科技有限公司 Processing method, identifying device and identifying system of advertisement marketing language information
CN104572616A (en) * 2014-12-23 2015-04-29 北京锐安科技有限公司 Method and device for identifying text orientation
CN108228573A (en) * 2018-03-23 2018-06-29 北京航空航天大学 Text emotion analysis method, device and electronic equipment
CN108563722A (en) * 2018-04-03 2018-09-21 有米科技股份有限公司 Trade classification method, system, computer equipment and the storage medium of text message
CN108595519A (en) * 2018-03-26 2018-09-28 平安科技(深圳)有限公司 Focus incident sorting technique, device and storage medium
WO2019200806A1 (en) * 2018-04-20 2019-10-24 平安科技(深圳)有限公司 Device for generating text classification model, method, and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778225A (en) * 2014-01-23 2014-05-07 北京奇虎科技有限公司 Processing method, identifying device and identifying system of advertisement marketing language information
CN104572616A (en) * 2014-12-23 2015-04-29 北京锐安科技有限公司 Method and device for identifying text orientation
CN108228573A (en) * 2018-03-23 2018-06-29 北京航空航天大学 Text emotion analysis method, device and electronic equipment
CN108595519A (en) * 2018-03-26 2018-09-28 平安科技(深圳)有限公司 Focus incident sorting technique, device and storage medium
CN108563722A (en) * 2018-04-03 2018-09-21 有米科技股份有限公司 Trade classification method, system, computer equipment and the storage medium of text message
WO2019200806A1 (en) * 2018-04-20 2019-10-24 平安科技(深圳)有限公司 Device for generating text classification model, method, and computer readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112150251A (en) * 2020-10-09 2020-12-29 北京明朝万达科技股份有限公司 Article name management method and device
CN114116988A (en) * 2022-01-27 2022-03-01 国家邮政局邮政业安全中心 Consignment identification method and device
CN114116988B (en) * 2022-01-27 2022-05-06 国家邮政局邮政业安全中心 Consignment identification method and device

Similar Documents

Publication Publication Date Title
CN110993081B (en) Doctor online recommendation method and system
US20210391080A1 (en) Entity Semantic Relation Classification
WO2022141861A1 (en) Emotion classification method and apparatus, electronic device, and storage medium
CN108491377A (en) A kind of electric business product comprehensive score method based on multi-dimension information fusion
CN109815487B (en) Text quality inspection method, electronic device, computer equipment and storage medium
US20150356270A1 (en) Automated medical problem list generation from electronic medical record
CN107578270A (en) A kind of construction method, device and the computing device of financial label
CN110188357B (en) Industry identification method and device for objects
CN110111902B (en) Acute infectious disease attack period prediction method, device and storage medium
CN112541056A (en) Medical term standardization method, device, electronic equipment and storage medium
CN111078883A (en) Risk index analysis method and device, electronic equipment and storage medium
CN112597135A (en) User classification method and device, electronic equipment and readable storage medium
CN110309234A (en) A kind of client of knowledge based map holds position method for early warning, device and storage medium
CN115391669A (en) Intelligent recommendation method and device and electronic equipment
CN107357782B (en) Method and terminal for identifying gender of user
CN111625567A (en) Data model matching method, device, computer system and readable storage medium
CN114840684A (en) Map construction method, device and equipment based on medical entity and storage medium
US11803796B2 (en) System, method, electronic device, and storage medium for identifying risk event based on social information
CN112699230A (en) Malignant tumor diagnosis and treatment knowledge acquisition method and device
CN110717326B (en) Text information author identification method and device based on machine learning
CN112395401A (en) Adaptive negative sample pair sampling method and device, electronic equipment and storage medium
CN115186650B (en) Data detection method and related device
CN111274384B (en) Text labeling method, equipment and computer storage medium thereof
CN113806492B (en) Record generation method, device, equipment and storage medium based on semantic recognition
CN113821641B (en) Method, device, equipment and storage medium for classifying medicines based on weight distribution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200428