WO2018201599A1 - 基于社交信息的风险事件的识别系统、方法、电子装置及存储介质 - Google Patents
基于社交信息的风险事件的识别系统、方法、电子装置及存储介质 Download PDFInfo
- Publication number
- WO2018201599A1 WO2018201599A1 PCT/CN2017/091358 CN2017091358W WO2018201599A1 WO 2018201599 A1 WO2018201599 A1 WO 2018201599A1 CN 2017091358 W CN2017091358 W CN 2017091358W WO 2018201599 A1 WO2018201599 A1 WO 2018201599A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- social
- social information
- predetermined
- preset
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000004458 analytical method Methods 0.000 claims abstract description 11
- 238000012552 review Methods 0.000 claims abstract description 11
- 230000011218 segmentation Effects 0.000 claims description 67
- 238000012549 training Methods 0.000 claims description 35
- 238000012706 support-vector machine Methods 0.000 claims description 14
- 238000013507 mapping Methods 0.000 claims description 10
- 238000012795 verification Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000012550 audit Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Definitions
- the present invention relates to the field of financial technologies, and in particular, to an identification system, method, electronic device, and computer readable storage medium for risk events based on social information.
- An object of the present invention is to provide a system, method, electronic device and computer readable storage medium for risk information based on social information, which aims to accurately and effectively identify whether social information is negative information and avoid the occurrence of risk events.
- the present invention provides a system for identifying a risk event based on social information, and the system for identifying a risk event based on a social information includes:
- An obtaining module configured to obtain social information posted by each predetermined social account from a predetermined social server
- An analysis module configured to analyze the social information to obtain a company name and/or a product name in the social information
- a parsing module configured to parse the core viewpoint information corresponding to the social information according to a preset rule when obtaining the company name and/or the product name in the social information
- An identification module configured to use a classifier generated by the pre-training to identify the information pointing category corresponding to the core viewpoint information, so as to send the social information corresponding to the preset information to the category and the social account that issues the social information to the predetermined terminal. Review.
- the present invention also provides an electronic device including a memory and a processor coupled to the memory, wherein the memory stores a social information based risk operable on the processor
- the identification system of the event when the identification system of the social information based risk event is executed by the processor, implements the following steps:
- S4 The classifier that is generated by the pre-training is used to identify the information pointing category corresponding to the core viewpoint information, so that the social information corresponding to the preset information pointing category and the social account that publishes the social information are sent to the predetermined terminal for review.
- the present invention also provides a method for identifying a risk event based on social information, where the method for identifying a risk event based on social information includes:
- S4 The classifier that is generated by the pre-training is used to identify the information pointing category corresponding to the core viewpoint information, so that the social information corresponding to the preset information pointing category and the social account that publishes the social information are sent to the predetermined terminal for review.
- the present invention also provides a computer readable storage medium having an identification system based on a social information based risk event stored thereon, the identification system of the social information based risk event being processed Steps of implementing the above-described method for identifying a social information-based risk event when the device is executed
- the present invention acquires social information published by each social account from a social server; analyzes the social information to obtain a company name and/or a product name in the social information; and parses and obtains the company name and/or Or the core viewpoint information corresponding to the social information of the product name; finally, the information corresponding to the core viewpoint information is identified by the classifier to point to the category, and the social information of the preset information pointing to the category may be sent to the predetermined terminal for review, the present invention
- the core value viewpoint of the social information can be accurately and effectively identified to identify whether it is negative information. Thereby controlling the release of negative information in the social network to prevent the occurrence of risk events.
- FIG. 1 is a schematic structural diagram of hardware of an embodiment of an electronic device according to the present invention.
- FIG. 2 is a schematic structural diagram of an embodiment of a system for identifying a risk event based on social information according to the present invention
- FIG. 3 is a schematic structural view of the analysis module shown in FIG. 2;
- FIG. 4 is a schematic structural view of the analysis module shown in FIG. 2;
- FIG. 5 is a schematic structural diagram of a predetermined structure word segmentation tree
- FIG. 6 is a schematic flowchart diagram of an embodiment of a method for identifying a risk event based on social information according to the present invention.
- FIG. 1 is a schematic diagram showing the hardware structure of a preferred embodiment of an electronic device according to the present invention.
- the electronic device 1 is an apparatus capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance.
- the electronic device 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing.
- a super virtual computer consisting of a group of loosely coupled computers.
- the electronic device 1 may include, but is not limited to, a memory 11, a processor 12, and a display 13.
- Figure 1 shows only the electronic device 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
- the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a hard disk or memory of the electronic device 1.
- the memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC), and a secure digital (SD). Card, flash card, etc.
- the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
- the memory 11 is used to store application software installed in the electronic device 1 and various types of data, such as program codes of an identification system based on risk information of social information, and the like.
- the memory 11 can also be used to temporarily store data that has been output or is about to be output.
- the processor 12 in some embodiments, may be a Central Processing Unit (CPU), microprocessor or other data processing chip for running program code or processing data stored in the memory 11, such as performing social information based The identification system of risk events, etc.
- CPU Central Processing Unit
- microprocessor or other data processing chip for running program code or processing data stored in the memory 11, such as performing social information based The identification system of risk events, etc.
- the display 13 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor, or the like in some embodiments.
- the display 13 is for displaying information processed in the electronic device 1 and a user interface for displaying visualization, such as an identification interface of a risk event or the like.
- the components 11-13 of the electronic device 1 communicate with one another via a system bus.
- the identification system of the social information based risk event is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the present application
- the method of various embodiments; and the at least one computer readable instruction can be classified into different logic modes depending on functions implemented by the various portions thereof Piece.
- FIG. 2 is a functional block diagram of an embodiment of an identification system for a social information based risk event according to the present invention.
- the identification system of the social information based risk event may be divided into one or more modules, one or more modules being stored in the memory 11 and by one or more processors (this embodiment is The processor 12) is executed to complete the present invention.
- the identification system of the social information based risk event may be divided into an acquisition module 101, an analysis module 102, a parsing module 103, and an identification module 104.
- module refers to a series of computer program instruction segments capable of performing a specific function, and is more suitable than the program for describing the execution process of the identification system based on the social information-based risk event in the electronic device 1, wherein:
- the obtaining module 101 is configured to obtain social information published by each predetermined social account from a predetermined social server;
- the predetermined social server is, for example, a microblog server, a WeChat server, or a QQ server.
- the social account corresponds to the social server, for example, a Weibo account, a WeChat account, or a QQ account.
- the predetermined social account may be a partial social account or a full social account of the social server.
- the user posts the social information on the social account, for example, the social information that the insurance salesperson A uses the WeChat account to post in the circle of friends or a group of friends, for example, "Ping An has introduced the product of Zunhong Life".
- the identification system of the social information-based risk event may acquire the social information posted by the predetermined social accounts from the social server in real time to obtain the latest social information, and may also periodically acquire the predetermined social media server.
- the social information published by each social account can reduce the burden on the system relative to the manner in which the social information is obtained periodically.
- the analyzing module 102 is configured to analyze the social information to obtain a company name and/or a product name in the social information;
- the social information published by each social account is analyzed to obtain the company name and/or the product name in the published social information, for example, the above-mentioned social information “Ping An has launched the Zunhong Life Product”. After analysis, the company name "Ping An” and the product name “Zhonghong Life” can be obtained. For the social information "Today's visit to the attractions", the company name and/or product name cannot be obtained through analysis.
- words and/or words may be segmented, and then all the words and/or words after the segmentation and words and pre-stored in a predetermined word library are / or the word is matched to analyze the company name and/or product name in the obtained social information; in another embodiment, after the word and/or word is segmented into the social information, the noun can be further obtained. Then, these nouns are matched with the nouns pre-stored in the predetermined noun library to analyze the company name and/or product name in the obtained social information. If the company name and/or product name is not obtained in the piece of social information, the social information is not processed, and the next social information is continuously analyzed for the company name and/or product name.
- the parsing module 103 is configured to: when obtaining the company name and/or the product name in the social information, parse the core viewpoint information corresponding to the social information according to a preset rule;
- a piece of social information including a company name and/or a product name is parsed to obtain core view information in the piece of social information, and the core view information is a view on the company name and/or product name. Or opinion.
- words and/or words of a predetermined part of speech in social information including a company name and/or a product name may be extracted, such as social information after segmentation of words and/or words.
- the predetermined part of speech may be, for example, an adjective, a verb, a noun or a helper, and the like, and then analyzing the extracted words and/or words of the predetermined part of speech to obtain the core viewpoint information corresponding to the social information.
- the core view information is “Zhonghong Life Product Safety, High Income”
- the social information after segmentation of words and/or words is analyzed to analyze whether there are negative words and/or words, for example, after segmentation of words and/or words.
- the social information is analyzed to determine whether negative words and/or words are included to obtain core opinion information corresponding to the social information.
- the identification module 104 is configured to use the classifier generated by the pre-training to identify the information pointing category corresponding to the core viewpoint information, so as to send the social information corresponding to the preset information to the category and the social account that issues the social information to the predetermined The terminal conducts an audit.
- the pre-training generated classifier is preferably a support vector machine classifier, and the information pointing category corresponding to the core viewpoint information includes positive information and negative information.
- the social information based risk event identification system further includes training generation support.
- the training module of the vector machine classifier is used to: obtain a core quantity information sample of a preset number (for example, 10000) of positive information (for example, the sample is a wide range of Ping An health insurance coverage, the Ping An Auto Insurance brand is fast), and a preset A sample of the core view information of the amount of negative information (for example, the sample is a poor service for the safe car insurance claims, the high level of security products is not promised); the sample of all core opinions obtained is randomly divided into the first preset ratio (for example, 70%) a training set and a second preset ratio (for example, 30%) of the verification set, wherein the sum of the ratios of the training set and the verification set is less than or equal to 1, and the predetermined support vector machine classifier is trained by using the training set (at the
- the classifier After the information corresponding to the core viewpoint information is pointed to the category by the classifier, if the information corresponding to the core viewpoint information points to the category is negative information, the corresponding social information and the social account that issues the social information are sent to the predetermined terminal, Review this social information. If the audit is confirmed as negative information, some measures can be taken on the social account to control the release of negative information. For example, sending a reminder message to the social account, reminding the user of the social account not to post negative information; or sending the prompt information of the violation operation to the user of the social account.
- the embodiment obtains the social information posted by each social account from the social server; analyzes the social information to obtain the company name and/or the product name in the social information; and parses the included company name and / or core information information corresponding to the social information of the product name; finally, the information corresponding to the core viewpoint information is identified by the classifier to point to the category, and the social information for the preset information pointing to the category (for example, negative information) may be sent to the predetermined
- the terminal performs auditing.
- the company name and/or product name is obtained by analyzing the social information, and then the core value viewpoint information in the social information is parsed, so that the core value viewpoint of the social information can be accurately and effectively recognized. Identify whether it is negative information, thereby controlling the release of negative information in the social network to prevent the occurrence of risk events.
- the analysis module 102 includes:
- the word segmentation unit 1021 is configured to perform word segmentation processing on the social information according to a predetermined word segmentation rule to obtain a corresponding word segmentation; wherein the word segmentation includes words and words, for example, for social information “Ping An has introduced a product of Zunhong Life”, after the word segmentation The result is “Peace”, “Publish”, “Yes”, “Zhonghong Life” and “Product”.
- the predetermined word segmentation rule is that the social information is split into short sentences according to preset type punctuation marks, and the short sentences obtained by the splitting are processed according to the long word priority principle: for example, according to the punctuation mark “,”, ".”, "!, and ";”, etc., to break the social information into short sentences, from the beginning of each social information to the first punctuation mark is a short sentence, if the end position of the social information Without punctuation, there is a short sentence from the first punctuation to the end of the social information, and for every two punctuation marks from the first punctuation to the last preset type punctuation The information between the two is a short sentence; if there is punctuation at the end of the social information, there is a short sentence between each two punctuation marks from the first punctuation to the first punctuation.
- the long word priority principle refers to: For a short sentence T1 that needs a word segmentation, start with the first word A and look for the pre-existing thesaurus. The longest word X1 starting from A is extracted, then X1 is removed from T1 and T2 is left, and the same segmentation principle is applied to T2.
- the result of the segmentation is "X1/X2/, ,,,,,,”
- the social information “Ping An has launched the Zunhong Life Product”, including the “Peace”, “Push”, “Yes”, “Zhonghong Life” and “Product” in the pre-existing thesaurus, the social information
- the results of the division are “Peace”, “Push”, “Yes”, “Zhonghong Life” and “Product”.
- the tagging unit 1022 is configured to perform part-of-speech tagging on the participle according to a predetermined part-of-speech tagging rule; for example, the part-of-speech tagging may be: “Peace/Noun”, “Push/verb”, “A/Auxiliary”, “Zhonghong Life/ Noun”, "product / noun”.
- the predetermined part-of-speech tagging rule is: according to the mapping relationship between words and words in the universal word dictionary library and part of speech (for example, in the universal word dictionary library, the part of speech corresponding to the playground is a noun), and/or according to a preset word And the relationship between the word and the part of speech (for example, the presupposition of words and words and part of speech) In the mapping relationship, the part of speech corresponding to the playground is a common noun), and the part of speech corresponding to each participle after the word segmentation is determined and marked.
- the part-of-speech tagging may be performed separately according to the mapping relationship between words and words in the universal word dictionary library and the part of speech, or the part-of-speech tagging may be separately performed according to the mapping relationship between the preset words and words and the part of speech, or according to the words in the universal word dictionary library.
- the mapping relationship between the word and the word and the word-to-speech relationship between the word and the word, and the part-of-speech tagging of the word-to-speech The priority of the mapping relationship between words and words and part of speech in the general word dictionary library.
- the part of speech corresponding to the playground is a noun, and the preset words and words are respectively mapped to the part of speech, the playground
- the corresponding part of speech is a common noun, and the part of speech marked on the playground is a common noun).
- Label the corresponding part of speech for each participle for example, identify the auxiliary words in the participle according to the pre-existing auxiliary word library (for example, the auxiliary words “a”, “come”, “to”, “over”, “of”, “land”, “ Obtain “,” “like”, “so”, etc., and perform auxiliary word recognition on the identified auxiliary words; identify adjectives in the participle according to the pre-existing adjectives (eg "very safe”, “guaranteed type", “ High returns, "long cycle”, etc.), and adjectives are used to identify adjectives; the verbs in the segmentation are identified according to the pre-existing verbs (eg "push”, “push”, “fat”, “publish” “, “development”, “sales”, etc.), and verbally tagged the recognized verbs.
- the pre-existing auxiliary word library for example, the auxiliary words “a”, “come”, “to”, “over”, “of”, “land”, “ Obtain “,” “like”, “so”, etc.,
- the classifying unit 1023 is configured to classify the word segmentation whose part of speech is a noun (for example, a person's name, a place name, a company name, a product name, and other nouns) according to a predetermined word classification rule, to obtain the social information from the classification result.
- a noun for example, a person's name, a place name, a company name, a product name, and other nouns
- the predetermined word classification rule is: using a recognition model generated by the pre-training to identify the noun category of the participle whose vocabulary is marked as a noun, and classifying the part of the vocabulary as a noun to perform noun classification.
- the recognition model is a conditional random field. Model (CRF).
- the training process of the conditional random field model includes:
- constructing a training data set constructing a preset number of training data sets in a predetermined short sentence data set format (for example, ⁇ company_name: Pingan ⁇ launched ⁇ product_name: Zunhong Life ⁇ product);
- extracted feature variables include but are not limited to: part of speech, context information, structure of words, etc.
- Transform unstructured data into a structured feature matrix For example, the social information “Ping An has introduced the product of Zunhong Life”, the example of the feature matrix is shown in Table 1 below:
- Training model The constructed feature matrix is used as an input variable, the conditional random field model is trained, and the trained conditional random field model is used as a model for identifying noun categories, and various categories of nouns are output, for example, the output category is a person's name.
- the noun, the output category is the noun of the company name, the output category is the noun of the product name, etc., and finally the noun is obtained from the output result as the noun of the company name and/or product name.
- the predetermined verb can be obtained, such as the verb "push”, “push”, “send”, “publish”, “develop” or “sale”, etc., and then obtain the The noun after the verb is used as a category, and the noun of the company name and/or product name is obtained from the noun of the category.
- the parsing module 103 includes:
- the constructing unit 1031 is configured to, when acquiring the company name and/or the product name in the social information, construct a preset structure according to the order and part of the word segmentation in the social information of obtaining the company name and/or the product name. Participle tree
- the preset structure word segmentation tree includes a multi-level node, the first-level node is the social information, and the second-level node is a participle obtained by the social information according to the order of the corresponding word segmentation and the part of speech. Phrases (such as noun phrases, verb phrases, etc.), each level node after the second-level node is obtained by the segmentation phrase of the upper-level node according to the part-of-speech, until it is divided into the last-level node of each node branch. .
- the participle phrase is the last level node of the node branch where it is located, and “I go to the playground to play football”, the constructed default word segmentation tree is as shown in Fig. 5. Show.
- the parsing unit 1032 is configured to parse the core viewpoint information corresponding to the corresponding social information based on the preset structure word segmentation tree.
- the node distance of the word segmentation of the first predetermined part of speech (eg, noun) and the second predetermined part of speech (eg, verb or adjective) is calculated, that is, the first preset is calculated.
- the number of nodes separated by the participle of the part of speech and the participle of the second part of the part of speech find the second part of the part of speech with the smallest distance from the first part of the word segmentation, the first part of speech
- the word segmentation and the segmentation of the second predetermined part of speech from its smallest part constitute the corresponding core viewpoint information in the order of the social information.
- FIG. 6 is a schematic structural diagram of an embodiment of a method for identifying a risk event based on social information according to an embodiment of the present invention, wherein a method for identifying a risk event based on social information may be performed by an identification system based on a social information-based risk event.
- the identification system of the social information based risk event may be implemented by software and/or hardware, and the identification system of the social information based risk event may be integrated in the server.
- the method for identifying the social information-based risk event includes:
- Step S1 Obtain predetermined social information published by each social account from a predetermined social server;
- the predetermined social server is, for example, a microblog server, a WeChat server, or a QQ server.
- the social account corresponds to the social server, for example, a Weibo account, a WeChat account, or a QQ account.
- the predetermined social account may be a partial social account or a full social account of the social server.
- the user posts the social information on the social account, for example, the social information that the insurance salesperson A uses the WeChat account to post in the circle of friends or a group of friends, for example, "Ping An has introduced the product of Zunhong Life".
- the identification system of the social information-based risk event may acquire the social information posted by the predetermined social accounts from the social server in real time to obtain the latest social information, and may also periodically acquire the predetermined social media server.
- the social information published by each social account can reduce the burden on the system relative to the manner in which the social information is obtained periodically.
- Step S2 analyzing the social information to obtain a company name and/or a product name in the social information
- the social information published by each social account is analyzed to obtain the company name and/or the product name in the published social information, for example, the above-mentioned social information “Ping An has launched the Zunhong Life Product”. After analysis, the company name "Ping An” and the product name “Zhonghong Life” can be obtained. For the social information "Today's visit to the attractions", the company name and/or product name cannot be obtained through analysis.
- words and/or words may be segmented, and then all the words and/or words after the segmentation and words and pre-stored in a predetermined word library are / or the word is matched to analyze the company name and/or product name in the obtained social information; in another embodiment, after the word and/or word is segmented into the social information, the noun can be further obtained. Then, these nouns are matched with the nouns pre-stored in the predetermined noun library to analyze the company name and/or product name in the obtained social information. If the company name and/or product name is not obtained in the piece of social information, the social information is not processed, and the next social information is continuously analyzed for the company name and/or product name.
- Step S3 when obtaining the company name and/or the product name in the social information, parsing the core viewpoint information corresponding to the social information according to a preset rule;
- a piece of social information including a company name and/or a product name is parsed to obtain core view information in the piece of social information, and the core view information is a view on the company name and/or product name. Or opinion.
- words and/or words of a predetermined part of speech in social information including a company name and/or a product name may be extracted, such as social information after segmentation of words and/or words.
- the predetermined part of speech may be, for example, an adjective, a verb, a noun or a helper, and the like, and then analyzing the extracted words and/or words of the predetermined part of speech to obtain the core viewpoint information corresponding to the social information.
- the core view information is "Zhonghong life product safety, high income”; in another embodiment, the word and / or word
- the segmented social information is analyzed to analyze whether there are negative words and/or words, such as analyzing the social information after the word and/or word segmentation to determine whether the negative word and/or are included. Or word to obtain the core viewpoint information corresponding to the social information.
- Step S4 The classifier that is generated by the pre-training is used to identify the information pointing category corresponding to the core viewpoint information, so as to send the social information corresponding to the preset information to the category and the social account that issues the social information to the predetermined terminal for review. .
- the classifier generated by the pre-training is preferably a support vector machine classifier, and the information pointing category corresponding to the core viewpoint information includes positive information and negative information.
- the information corresponding to the core viewpoint information is pointed to the category by the classifier, if the information corresponding to the core viewpoint information points to the category is negative information, the corresponding social information and the social account that issues the social information are sent to the predetermined terminal, Review this social information. If the audit is confirmed as negative information, the social account may be taken to control the release of the negative information, for example, sending a reminder message to the social account, reminding the user of the social account not to post negative information; or, for the social account The user sends a prompt message for the violation operation.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Machine Translation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
分词 | 词性 | 前置词 | 后置词 | 包含“平安” |
平安 | 名词 | Null | 推出 | True |
推出 | 动词 | 平安 | 了 | False |
了 | 助词 | 推出 | 尊宏人生 | False |
尊宏人生 | 名词 | 了 | 产品 | False |
产品 | 名词 | 尊宏人生 | 。 | False |
。 | 标点 | 产品 | Null | False |
Claims (20)
- 一种基于社交信息的风险事件的识别系统,其特征在于,所述基于社交信息的风险事件的识别系统包括:获取模块,用于从预定的社交服务器中获取预定的各社交账号发布的社交信息;分析模块,用于对所述社交信息进行分析,以获取所述社交信息中的公司名称和/或产品名称;解析模块,用于在获取得到所述社交信息中的公司名称和/或产品名称时,根据预设的规则解析得到所述社交信息对应的核心观点信息;识别模块,用于利用预先训练生成的分类器识别所述核心观点信息对应的信息指向类别,以便将属于预设的信息指向类别对应的社交信息及发布该社交信息的社交账号发送给预定的终端进行审核。
- 一种电子装置,其特征在于,所述电子装置包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的基于社交信息的风险事件的识别系统,所述基于社交信息的风险事件的识别系统被所述处理器执行时实现如下步骤:S1,从预定的社交服务器中获取预定的各社交账号发布的社交信息;S2,对所述社交信息进行分析,以获取所述社交信息中的公司名称和/或产品名称;S3,在获取得到所述社交信息中的公司名称和/或产品名称时,根据预设的规则解析得到所述社交信息对应的核心观点信息;S4,利用预先训练生成的分类器识别所述核心观点信息对应的信息指向类别,以便将属于预设的信息指向类别对应的社交信息及发布该社交信息的社交账号发送给预定的终端进行审核。
- 根据权利要求2所述的电子装置,其特征在于,所述信息指向类别包括正面信息和负面信息,所述分类器为支持向量机分类器,所述基于社交信息的风险事件的识别系统被所述处理器执行时,还实现如下步骤:获取预设数量的正面信息的核心观点信息样本及预设数量的负面信息的核心观点信息样本,将获取的所有核心观点信息样本随机分成第一预设比例的训练集和第二预设比例的验证集,利用所述训练集训练预定的支持向量机分类器,并利用所述验证集验证训练后的支持向量机分类器的准确率,若所述准确率大于等于预设准确率,则训练结束,以训练后的支持向量机分类器为所述分类器,或者,若所述准确率小于预设准确率,则增加正面信息的核心观点信息样本数量及负面信息的核心观点信息样本数量,以重新进行训练。
- 根据权利要求2或3所述的电子装置,其特征在于,所述步骤S2包括:按照预定的分词规则对所述社交信息进行分词处理以获取对应的分词;按照预定的词性标注规则对所述分词进行词性标注;对词性为名词的分词,按照预定的词分类规则进行分类,以从分类结果中获取所述社交信息中的公司名称和/或产品名称。
- 根据权利要求4所述的电子装置,其特征在于,所述预定的分词规则为:按预设类型标点符号对所述社交信息进行短句拆分,对拆分得到的短句,采用长词优先原则进行分词处理。
- 根据权利要求5所述的电子装置,其特征在于,所述预先确定的词性标注规则为:根据通用字词典库中字和词分别与词性的映射关系,及/或,根据预设的字和词分别与词性的映射关系,确定分词处理后的各分词对应的词性,并进行标注。
- 根据权利要求6所述的电子装置,其特征在于,所述预定的词分类规则为:采用预先训练生成的识别模型对词性标注为名词的分词进行名词类别识别,以将词性标注为名词的分词进行名词分类,所述识别模型为条件随机场模型。
- 根据权利要求5所述的电子装置,其特征在于,所述步骤S3包括:在获取得到所述社交信息中的公司名称和/或产品名称时,根据获取得到公司名称和/或产品名称的社交信息中的各个分词的顺序及词性构建成预设结构分词树;基于所述预设结构分词树解析出对应的社交信息对应的核心观点信息。
- 根据权利要求8所述的电子装置,其特征在于,所述预设结构分词树包括多级节点,第一级节点为所述社交信息,第二级节点为由所述社交信息按照对应的分词的顺序及词性划分得到的分词短语,第二级节点之后的每一级节点均是由上一级节点的分词短语按照词性划分得到。
- 根据权利要求9所述的电子装置,其特征在于,所述步骤S3进一步包括:基于所述预设结构分词树计算第一预设词性的分词与第二预设词性的分词的节点距离;获取与第一个预设词性的分词节点距离最小的第二个预设词性的分词,将第一个预设词性的分词与距离其最小的第二个预设词性的分词按序组成对应的核心观点信息。
- 一种基于社交信息的风险事件的识别方法,其特征在于,所述基于社交信息的风险事件的识别方法包括:S1,从预定的社交服务器中获取预定的各社交账号发布的社交信息;S2,对所述社交信息进行分析,以获取所述社交信息中的公司名称和/或产品名称;S3,在获取得到所述社交信息中的公司名称和/或产品名称时,根据预设的规则解析得到所述社交信息对应的核心观点信息;S4,利用预先训练生成的分类器识别所述核心观点信息对应的信息指向 类别,以便将属于预设的信息指向类别对应的社交信息及发布该社交信息的社交账号发送给预定的终端进行审核。
- 根据权利要求11所述的基于社交信息的风险事件的识别方法,其特征在于,所述信息指向类别包括正面信息和负面信息,所述分类器为支持向量机分类器,所述基于社交信息的风险事件的识别方法还包括:获取预设数量的正面信息的核心观点信息样本及预设数量的负面信息的核心观点信息样本,将获取的所有核心观点信息样本随机分成第一预设比例的训练集和第二预设比例的验证集,利用所述训练集训练预定的支持向量机分类器,并利用所述验证集验证训练后的支持向量机分类器的准确率,若所述准确率大于等于预设准确率,则训练结束,以训练后的支持向量机分类器为所述分类器,或者,若所述准确率小于预设准确率,则增加正面信息的核心观点信息样本数量及负面信息的核心观点信息样本数量,以重新进行训练。
- 根据权利要求11或12所述的基于社交信息的风险事件的识别方法,其特征在于,所述步骤S2包括:按照预定的分词规则对所述社交信息进行分词处理以获取对应的分词;按照预定的词性标注规则对所述分词进行词性标注;对词性为名词的分词,按照预定的词分类规则进行分类,以从分类结果中获取所述社交信息中的公司名称和/或产品名称。
- 根据权利要求13所述的基于社交信息的风险事件的识别方法,其特征在于,所述预定的分词规则为:按预设类型标点符号对所述社交信息进行短句拆分,对拆分得到的短句,采用长词优先原则进行分词处理。
- 根据权利要求14所述的基于社交信息的风险事件的识别方法,其特征在于,所述预先确定的词性标注规则为:根据通用字词典库中字和词分别与词性的映射关系,及/或,根据预设的字和词分别与词性的映射关系,确定分词处理后的各分词对应的词性,并进行标注。
- 根据权利要求15所述的基于社交信息的风险事件的识别方法,其特征在于,所述预定的词分类规则为:采用预先训练生成的识别模型对词性标注为名词的分词进行名词类别识别,以将词性标注为名词的分词进行名词分类,所述识别模型为条件随机场模型。
- 根据权利要求14所述的基于社交信息的风险事件的识别方法,其特征在于,所述步骤S3包括:在获取得到所述社交信息中的公司名称和/或产品名称时,根据获取得到公司名称和/或产品名称的社交信息中的各个分词的顺序及词性构建成预设结构分词树;基于所述预设结构分词树解析出对应的社交信息对应的核心观点信息。
- 根据权利要求17所述的基于社交信息的风险事件的识别方法,其特征在于,所述预设结构分词树包括多级节点,第一级节点为所述社交信息,第二级节点为由所述社交信息按照对应的分词的顺序及词性划分得到的分词短语,第二级节点之后的每一级节点均是由上一级节点的分词短语按照词性划分得到。
- 根据权利要求18所述的基于社交信息的风险事件的识别方法,其特征在于,所述步骤S3进一步包括:基于所述预设结构分词树计算第一预设词性的分词与第二预设词性的分词的节点距离;获取与第一个预设词性的分词节点距离最小的第二个预设词性的分词,将第一个预设词性的分词与距离其最小的第二个预设词性的分词按序组成对应的核心观点信息。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有基于社交信息的风险事件的识别系统,所述基于社交信息的风险事件的识别系统被处理器执行时实现上述权利要求11-19任一项所述的基于社交信息的风险事件的识别方法步骤。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020187017275A KR20190022430A (ko) | 2017-05-05 | 2017-06-30 | 소셜 정보 기반의 리스크 이벤트의 식별 시스템, 방법, 전자장치 및 저장매체 |
US16/084,235 US11803796B2 (en) | 2017-05-05 | 2017-06-30 | System, method, electronic device, and storage medium for identifying risk event based on social information |
EP17897215.4A EP3425531A4 (en) | 2017-05-05 | 2017-06-30 | SYSTEM, METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM FOR IDENTIFYING A RISK EVENT BASED ON SOCIAL INFORMATION |
JP2018530794A JP6608061B2 (ja) | 2017-05-05 | 2017-06-30 | Sns情報に基づくリスクイベント認識システム、方法、電子装置及び記憶媒体 |
SG11201901072SA SG11201901072SA (en) | 2017-05-05 | 2017-06-30 | System, method, electronic device, and storage medium for identifying risk event based on social information |
AU2017404560A AU2017404560A1 (en) | 2017-05-05 | 2017-06-30 | System, method, electronic device, and storage medium for identifying risk event based on social information |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710313184.0A CN107688594B (zh) | 2017-05-05 | 2017-05-05 | 基于社交信息的风险事件的识别系统及方法 |
CN201710313184.0 | 2017-05-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018201599A1 true WO2018201599A1 (zh) | 2018-11-08 |
Family
ID=61152473
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/091358 WO2018201599A1 (zh) | 2017-05-05 | 2017-06-30 | 基于社交信息的风险事件的识别系统、方法、电子装置及存储介质 |
Country Status (8)
Country | Link |
---|---|
US (1) | US11803796B2 (zh) |
EP (1) | EP3425531A4 (zh) |
JP (1) | JP6608061B2 (zh) |
KR (1) | KR20190022430A (zh) |
CN (1) | CN107688594B (zh) |
AU (1) | AU2017404560A1 (zh) |
SG (1) | SG11201901072SA (zh) |
WO (1) | WO2018201599A1 (zh) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135693A (zh) * | 2019-04-12 | 2019-08-16 | 北京中科闻歌科技股份有限公司 | 一种风险识别方法、装置、设备及存储介质 |
CN110377809A (zh) * | 2019-06-19 | 2019-10-25 | 深圳壹账通智能科技有限公司 | 预设用户的资源获取资质生成方法及相关设备 |
CN110287493B (zh) * | 2019-06-28 | 2023-04-18 | 中国科学技术信息研究所 | 风险短语识别方法、装置、电子设备及存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105138640A (zh) * | 2015-08-24 | 2015-12-09 | 成都秋雷科技有限责任公司 | 基于云的网页广告筛选方法 |
CN105141607A (zh) * | 2015-08-24 | 2015-12-09 | 成都秋雷科技有限责任公司 | 基于云的恶意链接拦截方法 |
CN105183793A (zh) * | 2015-08-24 | 2015-12-23 | 成都秋雷科技有限责任公司 | 网页弹窗快速拦截方法 |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008084064A (ja) * | 2006-09-28 | 2008-04-10 | National Institute Of Advanced Industrial & Technology | テキスト分類処理方法、テキスト分類処理装置ならびにテキスト分類処理プログラム |
CN101201819B (zh) * | 2007-11-28 | 2010-12-08 | 北京金山软件有限公司 | 一种树库转化方法及树库转化系统 |
CN101266520B (zh) * | 2008-04-18 | 2013-03-27 | 上海触乐信息科技有限公司 | 一种可实现灵活键盘布局的系统 |
CN101329666A (zh) * | 2008-06-18 | 2008-12-24 | 南京大学 | 基于语料库及树型结构模式匹配的汉语句法自动分析方法 |
US20120221485A1 (en) * | 2009-12-01 | 2012-08-30 | Leidner Jochen L | Methods and systems for risk mining and for generating entity risk profiles |
JP5286317B2 (ja) * | 2010-03-26 | 2013-09-11 | 株式会社野村総合研究所 | リスク情報提供システム及びプログラム |
US9672283B2 (en) * | 2012-06-06 | 2017-06-06 | Data Record Science | Structured and social data aggregator |
KR101409413B1 (ko) | 2012-07-20 | 2014-06-20 | 한양대학교 에리카산학협력단 | 단일화 문법을 이용한 자연어 처리 방법 |
US9213760B2 (en) * | 2012-11-27 | 2015-12-15 | Linkedin Corporation | Unified social content platform |
GB201308541D0 (en) | 2013-05-13 | 2013-06-19 | Qatar Foundation | Social media news portal |
JP5633944B1 (ja) * | 2013-06-02 | 2014-12-03 | データ・サイエンティスト株式会社 | 評価方法、評価装置、およびプログラム |
JP6071792B2 (ja) * | 2013-07-31 | 2017-02-01 | 株式会社東芝 | 社会情報提供システムおよび社会情報配信装置 |
CN103593431A (zh) * | 2013-11-11 | 2014-02-19 | 北京锐安科技有限公司 | 网络舆情分析方法和装置 |
CN104809109B (zh) * | 2014-01-23 | 2019-12-10 | 腾讯科技(深圳)有限公司 | 一种社交信息展示方法、装置及服务器 |
US9582486B2 (en) | 2014-05-13 | 2017-02-28 | Lc Cns Co., Ltd. | Apparatus and method for classifying and analyzing documents including text |
KR101561464B1 (ko) | 2014-08-25 | 2015-10-20 | 성균관대학교산학협력단 | 수집 데이터 감성분석 방법 및 장치 |
JP6392042B2 (ja) * | 2014-09-11 | 2018-09-19 | Kddi株式会社 | 情報提供装置、情報を提供する方法およびプログラム |
JP5972425B1 (ja) | 2015-05-08 | 2016-08-17 | 株式会社エルプランニング | 風評被害リスクレポート作成システム、プログラム及び方法 |
JP2017004127A (ja) | 2015-06-05 | 2017-01-05 | 富士通株式会社 | テキスト分割プログラム、テキスト分割装置、及びテキスト分割方法 |
CN107545505B (zh) * | 2016-06-24 | 2020-09-29 | 深圳壹账通智能科技有限公司 | 保险理财产品信息的识别方法及系统 |
-
2017
- 2017-05-05 CN CN201710313184.0A patent/CN107688594B/zh active Active
- 2017-06-30 AU AU2017404560A patent/AU2017404560A1/en not_active Abandoned
- 2017-06-30 SG SG11201901072SA patent/SG11201901072SA/en unknown
- 2017-06-30 KR KR1020187017275A patent/KR20190022430A/ko not_active Application Discontinuation
- 2017-06-30 WO PCT/CN2017/091358 patent/WO2018201599A1/zh active Application Filing
- 2017-06-30 US US16/084,235 patent/US11803796B2/en active Active
- 2017-06-30 JP JP2018530794A patent/JP6608061B2/ja active Active
- 2017-06-30 EP EP17897215.4A patent/EP3425531A4/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105138640A (zh) * | 2015-08-24 | 2015-12-09 | 成都秋雷科技有限责任公司 | 基于云的网页广告筛选方法 |
CN105141607A (zh) * | 2015-08-24 | 2015-12-09 | 成都秋雷科技有限责任公司 | 基于云的恶意链接拦截方法 |
CN105183793A (zh) * | 2015-08-24 | 2015-12-23 | 成都秋雷科技有限责任公司 | 网页弹窗快速拦截方法 |
Also Published As
Publication number | Publication date |
---|---|
EP3425531A4 (en) | 2020-04-22 |
AU2017404560A1 (en) | 2018-11-22 |
US20230186212A1 (en) | 2023-06-15 |
CN107688594B (zh) | 2019-07-16 |
JP2019520614A (ja) | 2019-07-18 |
CN107688594A (zh) | 2018-02-13 |
EP3425531A1 (en) | 2019-01-09 |
US11803796B2 (en) | 2023-10-31 |
KR20190022430A (ko) | 2019-03-06 |
SG11201901072SA (en) | 2019-03-28 |
JP6608061B2 (ja) | 2019-11-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263248B (zh) | 一种信息推送方法、装置、存储介质和服务器 | |
CN108519970B (zh) | 文本中敏感信息的鉴定方法、电子装置及可读存储介质 | |
WO2019227710A1 (zh) | 网络舆情的分析方法、装置及计算机可读存储介质 | |
JP5936698B2 (ja) | 単語意味関係抽出装置 | |
US20190005019A1 (en) | Contextual pharmacovigilance system | |
WO2019037258A1 (zh) | 信息推荐的装置、方法、系统及计算机可读存储介质 | |
US20200074242A1 (en) | System and method for monitoring online retail platform using artificial intelligence | |
WO2018201772A1 (zh) | 医疗文本的潜在疾病推断方法、系统及可读存储介质 | |
WO2020259280A1 (zh) | 日志管理方法、装置、网络设备和可读存储介质 | |
CN112686036B (zh) | 风险文本识别方法、装置、计算机设备及存储介质 | |
CN113095076A (zh) | 敏感词识别方法、装置、电子设备及存储介质 | |
CN104850617A (zh) | 短文本处理方法及装置 | |
WO2018201599A1 (zh) | 基于社交信息的风险事件的识别系统、方法、电子装置及存储介质 | |
CN107545505B (zh) | 保险理财产品信息的识别方法及系统 | |
JP2021093163A (ja) | ディープラーニングに基づく文書類似度測定モデルを利用した重複文書探知方法およびシステム | |
WO2021012958A1 (zh) | 原创文本甄别方法、装置、设备与计算机可读存储介质 | |
CN109753646B (zh) | 一种文章属性识别方法以及电子设备 | |
Mehta et al. | Sentiment analysis on product reviews using Hadoop | |
CN113420545B (zh) | 摘要生成方法、装置、设备及存储介质 | |
WO2018205460A1 (zh) | 获取目标用户的方法、装置、电子设备及介质 | |
CN113420143A (zh) | 文书摘要生成方法、装置、设备及存储介质 | |
CN114065763A (zh) | 一种基于事件抽取的舆情分析方法、装置及相关组件 | |
TWI712948B (zh) | 文本情緒分析的方法,裝置與電腦程式產品 | |
Wan et al. | Data mining technology application in false text information recognition | |
Mamatha et al. | Supervised aspect category detection of co-occurrence data using conditional random fields |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2018530794 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20187017275 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2017897215 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2017897215 Country of ref document: EP Effective date: 20180829 |
|
ENP | Entry into the national phase |
Ref document number: 2017404560 Country of ref document: AU Date of ref document: 20170630 Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |