WO2018201599A1 - 基于社交信息的风险事件的识别系统、方法、电子装置及存储介质 - Google Patents

基于社交信息的风险事件的识别系统、方法、电子装置及存储介质 Download PDF

Info

Publication number
WO2018201599A1
WO2018201599A1 PCT/CN2017/091358 CN2017091358W WO2018201599A1 WO 2018201599 A1 WO2018201599 A1 WO 2018201599A1 CN 2017091358 W CN2017091358 W CN 2017091358W WO 2018201599 A1 WO2018201599 A1 WO 2018201599A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
social
social information
predetermined
preset
Prior art date
Application number
PCT/CN2017/091358
Other languages
English (en)
French (fr)
Inventor
金戈
徐亮
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to KR1020187017275A priority Critical patent/KR20190022430A/ko
Priority to US16/084,235 priority patent/US11803796B2/en
Priority to EP17897215.4A priority patent/EP3425531A4/en
Priority to JP2018530794A priority patent/JP6608061B2/ja
Priority to SG11201901072SA priority patent/SG11201901072SA/en
Priority to AU2017404560A priority patent/AU2017404560A1/en
Publication of WO2018201599A1 publication Critical patent/WO2018201599A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • the present invention relates to the field of financial technologies, and in particular, to an identification system, method, electronic device, and computer readable storage medium for risk events based on social information.
  • An object of the present invention is to provide a system, method, electronic device and computer readable storage medium for risk information based on social information, which aims to accurately and effectively identify whether social information is negative information and avoid the occurrence of risk events.
  • the present invention provides a system for identifying a risk event based on social information, and the system for identifying a risk event based on a social information includes:
  • An obtaining module configured to obtain social information posted by each predetermined social account from a predetermined social server
  • An analysis module configured to analyze the social information to obtain a company name and/or a product name in the social information
  • a parsing module configured to parse the core viewpoint information corresponding to the social information according to a preset rule when obtaining the company name and/or the product name in the social information
  • An identification module configured to use a classifier generated by the pre-training to identify the information pointing category corresponding to the core viewpoint information, so as to send the social information corresponding to the preset information to the category and the social account that issues the social information to the predetermined terminal. Review.
  • the present invention also provides an electronic device including a memory and a processor coupled to the memory, wherein the memory stores a social information based risk operable on the processor
  • the identification system of the event when the identification system of the social information based risk event is executed by the processor, implements the following steps:
  • S4 The classifier that is generated by the pre-training is used to identify the information pointing category corresponding to the core viewpoint information, so that the social information corresponding to the preset information pointing category and the social account that publishes the social information are sent to the predetermined terminal for review.
  • the present invention also provides a method for identifying a risk event based on social information, where the method for identifying a risk event based on social information includes:
  • S4 The classifier that is generated by the pre-training is used to identify the information pointing category corresponding to the core viewpoint information, so that the social information corresponding to the preset information pointing category and the social account that publishes the social information are sent to the predetermined terminal for review.
  • the present invention also provides a computer readable storage medium having an identification system based on a social information based risk event stored thereon, the identification system of the social information based risk event being processed Steps of implementing the above-described method for identifying a social information-based risk event when the device is executed
  • the present invention acquires social information published by each social account from a social server; analyzes the social information to obtain a company name and/or a product name in the social information; and parses and obtains the company name and/or Or the core viewpoint information corresponding to the social information of the product name; finally, the information corresponding to the core viewpoint information is identified by the classifier to point to the category, and the social information of the preset information pointing to the category may be sent to the predetermined terminal for review, the present invention
  • the core value viewpoint of the social information can be accurately and effectively identified to identify whether it is negative information. Thereby controlling the release of negative information in the social network to prevent the occurrence of risk events.
  • FIG. 1 is a schematic structural diagram of hardware of an embodiment of an electronic device according to the present invention.
  • FIG. 2 is a schematic structural diagram of an embodiment of a system for identifying a risk event based on social information according to the present invention
  • FIG. 3 is a schematic structural view of the analysis module shown in FIG. 2;
  • FIG. 4 is a schematic structural view of the analysis module shown in FIG. 2;
  • FIG. 5 is a schematic structural diagram of a predetermined structure word segmentation tree
  • FIG. 6 is a schematic flowchart diagram of an embodiment of a method for identifying a risk event based on social information according to the present invention.
  • FIG. 1 is a schematic diagram showing the hardware structure of a preferred embodiment of an electronic device according to the present invention.
  • the electronic device 1 is an apparatus capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance.
  • the electronic device 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing.
  • a super virtual computer consisting of a group of loosely coupled computers.
  • the electronic device 1 may include, but is not limited to, a memory 11, a processor 12, and a display 13.
  • Figure 1 shows only the electronic device 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a hard disk or memory of the electronic device 1.
  • the memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC), and a secure digital (SD). Card, flash card, etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 is used to store application software installed in the electronic device 1 and various types of data, such as program codes of an identification system based on risk information of social information, and the like.
  • the memory 11 can also be used to temporarily store data that has been output or is about to be output.
  • the processor 12 in some embodiments, may be a Central Processing Unit (CPU), microprocessor or other data processing chip for running program code or processing data stored in the memory 11, such as performing social information based The identification system of risk events, etc.
  • CPU Central Processing Unit
  • microprocessor or other data processing chip for running program code or processing data stored in the memory 11, such as performing social information based The identification system of risk events, etc.
  • the display 13 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor, or the like in some embodiments.
  • the display 13 is for displaying information processed in the electronic device 1 and a user interface for displaying visualization, such as an identification interface of a risk event or the like.
  • the components 11-13 of the electronic device 1 communicate with one another via a system bus.
  • the identification system of the social information based risk event is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the present application
  • the method of various embodiments; and the at least one computer readable instruction can be classified into different logic modes depending on functions implemented by the various portions thereof Piece.
  • FIG. 2 is a functional block diagram of an embodiment of an identification system for a social information based risk event according to the present invention.
  • the identification system of the social information based risk event may be divided into one or more modules, one or more modules being stored in the memory 11 and by one or more processors (this embodiment is The processor 12) is executed to complete the present invention.
  • the identification system of the social information based risk event may be divided into an acquisition module 101, an analysis module 102, a parsing module 103, and an identification module 104.
  • module refers to a series of computer program instruction segments capable of performing a specific function, and is more suitable than the program for describing the execution process of the identification system based on the social information-based risk event in the electronic device 1, wherein:
  • the obtaining module 101 is configured to obtain social information published by each predetermined social account from a predetermined social server;
  • the predetermined social server is, for example, a microblog server, a WeChat server, or a QQ server.
  • the social account corresponds to the social server, for example, a Weibo account, a WeChat account, or a QQ account.
  • the predetermined social account may be a partial social account or a full social account of the social server.
  • the user posts the social information on the social account, for example, the social information that the insurance salesperson A uses the WeChat account to post in the circle of friends or a group of friends, for example, "Ping An has introduced the product of Zunhong Life".
  • the identification system of the social information-based risk event may acquire the social information posted by the predetermined social accounts from the social server in real time to obtain the latest social information, and may also periodically acquire the predetermined social media server.
  • the social information published by each social account can reduce the burden on the system relative to the manner in which the social information is obtained periodically.
  • the analyzing module 102 is configured to analyze the social information to obtain a company name and/or a product name in the social information;
  • the social information published by each social account is analyzed to obtain the company name and/or the product name in the published social information, for example, the above-mentioned social information “Ping An has launched the Zunhong Life Product”. After analysis, the company name "Ping An” and the product name “Zhonghong Life” can be obtained. For the social information "Today's visit to the attractions", the company name and/or product name cannot be obtained through analysis.
  • words and/or words may be segmented, and then all the words and/or words after the segmentation and words and pre-stored in a predetermined word library are / or the word is matched to analyze the company name and/or product name in the obtained social information; in another embodiment, after the word and/or word is segmented into the social information, the noun can be further obtained. Then, these nouns are matched with the nouns pre-stored in the predetermined noun library to analyze the company name and/or product name in the obtained social information. If the company name and/or product name is not obtained in the piece of social information, the social information is not processed, and the next social information is continuously analyzed for the company name and/or product name.
  • the parsing module 103 is configured to: when obtaining the company name and/or the product name in the social information, parse the core viewpoint information corresponding to the social information according to a preset rule;
  • a piece of social information including a company name and/or a product name is parsed to obtain core view information in the piece of social information, and the core view information is a view on the company name and/or product name. Or opinion.
  • words and/or words of a predetermined part of speech in social information including a company name and/or a product name may be extracted, such as social information after segmentation of words and/or words.
  • the predetermined part of speech may be, for example, an adjective, a verb, a noun or a helper, and the like, and then analyzing the extracted words and/or words of the predetermined part of speech to obtain the core viewpoint information corresponding to the social information.
  • the core view information is “Zhonghong Life Product Safety, High Income”
  • the social information after segmentation of words and/or words is analyzed to analyze whether there are negative words and/or words, for example, after segmentation of words and/or words.
  • the social information is analyzed to determine whether negative words and/or words are included to obtain core opinion information corresponding to the social information.
  • the identification module 104 is configured to use the classifier generated by the pre-training to identify the information pointing category corresponding to the core viewpoint information, so as to send the social information corresponding to the preset information to the category and the social account that issues the social information to the predetermined The terminal conducts an audit.
  • the pre-training generated classifier is preferably a support vector machine classifier, and the information pointing category corresponding to the core viewpoint information includes positive information and negative information.
  • the social information based risk event identification system further includes training generation support.
  • the training module of the vector machine classifier is used to: obtain a core quantity information sample of a preset number (for example, 10000) of positive information (for example, the sample is a wide range of Ping An health insurance coverage, the Ping An Auto Insurance brand is fast), and a preset A sample of the core view information of the amount of negative information (for example, the sample is a poor service for the safe car insurance claims, the high level of security products is not promised); the sample of all core opinions obtained is randomly divided into the first preset ratio (for example, 70%) a training set and a second preset ratio (for example, 30%) of the verification set, wherein the sum of the ratios of the training set and the verification set is less than or equal to 1, and the predetermined support vector machine classifier is trained by using the training set (at the
  • the classifier After the information corresponding to the core viewpoint information is pointed to the category by the classifier, if the information corresponding to the core viewpoint information points to the category is negative information, the corresponding social information and the social account that issues the social information are sent to the predetermined terminal, Review this social information. If the audit is confirmed as negative information, some measures can be taken on the social account to control the release of negative information. For example, sending a reminder message to the social account, reminding the user of the social account not to post negative information; or sending the prompt information of the violation operation to the user of the social account.
  • the embodiment obtains the social information posted by each social account from the social server; analyzes the social information to obtain the company name and/or the product name in the social information; and parses the included company name and / or core information information corresponding to the social information of the product name; finally, the information corresponding to the core viewpoint information is identified by the classifier to point to the category, and the social information for the preset information pointing to the category (for example, negative information) may be sent to the predetermined
  • the terminal performs auditing.
  • the company name and/or product name is obtained by analyzing the social information, and then the core value viewpoint information in the social information is parsed, so that the core value viewpoint of the social information can be accurately and effectively recognized. Identify whether it is negative information, thereby controlling the release of negative information in the social network to prevent the occurrence of risk events.
  • the analysis module 102 includes:
  • the word segmentation unit 1021 is configured to perform word segmentation processing on the social information according to a predetermined word segmentation rule to obtain a corresponding word segmentation; wherein the word segmentation includes words and words, for example, for social information “Ping An has introduced a product of Zunhong Life”, after the word segmentation The result is “Peace”, “Publish”, “Yes”, “Zhonghong Life” and “Product”.
  • the predetermined word segmentation rule is that the social information is split into short sentences according to preset type punctuation marks, and the short sentences obtained by the splitting are processed according to the long word priority principle: for example, according to the punctuation mark “,”, ".”, "!, and ";”, etc., to break the social information into short sentences, from the beginning of each social information to the first punctuation mark is a short sentence, if the end position of the social information Without punctuation, there is a short sentence from the first punctuation to the end of the social information, and for every two punctuation marks from the first punctuation to the last preset type punctuation The information between the two is a short sentence; if there is punctuation at the end of the social information, there is a short sentence between each two punctuation marks from the first punctuation to the first punctuation.
  • the long word priority principle refers to: For a short sentence T1 that needs a word segmentation, start with the first word A and look for the pre-existing thesaurus. The longest word X1 starting from A is extracted, then X1 is removed from T1 and T2 is left, and the same segmentation principle is applied to T2.
  • the result of the segmentation is "X1/X2/, ,,,,,,”
  • the social information “Ping An has launched the Zunhong Life Product”, including the “Peace”, “Push”, “Yes”, “Zhonghong Life” and “Product” in the pre-existing thesaurus, the social information
  • the results of the division are “Peace”, “Push”, “Yes”, “Zhonghong Life” and “Product”.
  • the tagging unit 1022 is configured to perform part-of-speech tagging on the participle according to a predetermined part-of-speech tagging rule; for example, the part-of-speech tagging may be: “Peace/Noun”, “Push/verb”, “A/Auxiliary”, “Zhonghong Life/ Noun”, "product / noun”.
  • the predetermined part-of-speech tagging rule is: according to the mapping relationship between words and words in the universal word dictionary library and part of speech (for example, in the universal word dictionary library, the part of speech corresponding to the playground is a noun), and/or according to a preset word And the relationship between the word and the part of speech (for example, the presupposition of words and words and part of speech) In the mapping relationship, the part of speech corresponding to the playground is a common noun), and the part of speech corresponding to each participle after the word segmentation is determined and marked.
  • the part-of-speech tagging may be performed separately according to the mapping relationship between words and words in the universal word dictionary library and the part of speech, or the part-of-speech tagging may be separately performed according to the mapping relationship between the preset words and words and the part of speech, or according to the words in the universal word dictionary library.
  • the mapping relationship between the word and the word and the word-to-speech relationship between the word and the word, and the part-of-speech tagging of the word-to-speech The priority of the mapping relationship between words and words and part of speech in the general word dictionary library.
  • the part of speech corresponding to the playground is a noun, and the preset words and words are respectively mapped to the part of speech, the playground
  • the corresponding part of speech is a common noun, and the part of speech marked on the playground is a common noun).
  • Label the corresponding part of speech for each participle for example, identify the auxiliary words in the participle according to the pre-existing auxiliary word library (for example, the auxiliary words “a”, “come”, “to”, “over”, “of”, “land”, “ Obtain “,” “like”, “so”, etc., and perform auxiliary word recognition on the identified auxiliary words; identify adjectives in the participle according to the pre-existing adjectives (eg "very safe”, “guaranteed type", “ High returns, "long cycle”, etc.), and adjectives are used to identify adjectives; the verbs in the segmentation are identified according to the pre-existing verbs (eg "push”, “push”, “fat”, “publish” “, “development”, “sales”, etc.), and verbally tagged the recognized verbs.
  • the pre-existing auxiliary word library for example, the auxiliary words “a”, “come”, “to”, “over”, “of”, “land”, “ Obtain “,” “like”, “so”, etc.,
  • the classifying unit 1023 is configured to classify the word segmentation whose part of speech is a noun (for example, a person's name, a place name, a company name, a product name, and other nouns) according to a predetermined word classification rule, to obtain the social information from the classification result.
  • a noun for example, a person's name, a place name, a company name, a product name, and other nouns
  • the predetermined word classification rule is: using a recognition model generated by the pre-training to identify the noun category of the participle whose vocabulary is marked as a noun, and classifying the part of the vocabulary as a noun to perform noun classification.
  • the recognition model is a conditional random field. Model (CRF).
  • the training process of the conditional random field model includes:
  • constructing a training data set constructing a preset number of training data sets in a predetermined short sentence data set format (for example, ⁇ company_name: Pingan ⁇ launched ⁇ product_name: Zunhong Life ⁇ product);
  • extracted feature variables include but are not limited to: part of speech, context information, structure of words, etc.
  • Transform unstructured data into a structured feature matrix For example, the social information “Ping An has introduced the product of Zunhong Life”, the example of the feature matrix is shown in Table 1 below:
  • Training model The constructed feature matrix is used as an input variable, the conditional random field model is trained, and the trained conditional random field model is used as a model for identifying noun categories, and various categories of nouns are output, for example, the output category is a person's name.
  • the noun, the output category is the noun of the company name, the output category is the noun of the product name, etc., and finally the noun is obtained from the output result as the noun of the company name and/or product name.
  • the predetermined verb can be obtained, such as the verb "push”, “push”, “send”, “publish”, “develop” or “sale”, etc., and then obtain the The noun after the verb is used as a category, and the noun of the company name and/or product name is obtained from the noun of the category.
  • the parsing module 103 includes:
  • the constructing unit 1031 is configured to, when acquiring the company name and/or the product name in the social information, construct a preset structure according to the order and part of the word segmentation in the social information of obtaining the company name and/or the product name. Participle tree
  • the preset structure word segmentation tree includes a multi-level node, the first-level node is the social information, and the second-level node is a participle obtained by the social information according to the order of the corresponding word segmentation and the part of speech. Phrases (such as noun phrases, verb phrases, etc.), each level node after the second-level node is obtained by the segmentation phrase of the upper-level node according to the part-of-speech, until it is divided into the last-level node of each node branch. .
  • the participle phrase is the last level node of the node branch where it is located, and “I go to the playground to play football”, the constructed default word segmentation tree is as shown in Fig. 5. Show.
  • the parsing unit 1032 is configured to parse the core viewpoint information corresponding to the corresponding social information based on the preset structure word segmentation tree.
  • the node distance of the word segmentation of the first predetermined part of speech (eg, noun) and the second predetermined part of speech (eg, verb or adjective) is calculated, that is, the first preset is calculated.
  • the number of nodes separated by the participle of the part of speech and the participle of the second part of the part of speech find the second part of the part of speech with the smallest distance from the first part of the word segmentation, the first part of speech
  • the word segmentation and the segmentation of the second predetermined part of speech from its smallest part constitute the corresponding core viewpoint information in the order of the social information.
  • FIG. 6 is a schematic structural diagram of an embodiment of a method for identifying a risk event based on social information according to an embodiment of the present invention, wherein a method for identifying a risk event based on social information may be performed by an identification system based on a social information-based risk event.
  • the identification system of the social information based risk event may be implemented by software and/or hardware, and the identification system of the social information based risk event may be integrated in the server.
  • the method for identifying the social information-based risk event includes:
  • Step S1 Obtain predetermined social information published by each social account from a predetermined social server;
  • the predetermined social server is, for example, a microblog server, a WeChat server, or a QQ server.
  • the social account corresponds to the social server, for example, a Weibo account, a WeChat account, or a QQ account.
  • the predetermined social account may be a partial social account or a full social account of the social server.
  • the user posts the social information on the social account, for example, the social information that the insurance salesperson A uses the WeChat account to post in the circle of friends or a group of friends, for example, "Ping An has introduced the product of Zunhong Life".
  • the identification system of the social information-based risk event may acquire the social information posted by the predetermined social accounts from the social server in real time to obtain the latest social information, and may also periodically acquire the predetermined social media server.
  • the social information published by each social account can reduce the burden on the system relative to the manner in which the social information is obtained periodically.
  • Step S2 analyzing the social information to obtain a company name and/or a product name in the social information
  • the social information published by each social account is analyzed to obtain the company name and/or the product name in the published social information, for example, the above-mentioned social information “Ping An has launched the Zunhong Life Product”. After analysis, the company name "Ping An” and the product name “Zhonghong Life” can be obtained. For the social information "Today's visit to the attractions", the company name and/or product name cannot be obtained through analysis.
  • words and/or words may be segmented, and then all the words and/or words after the segmentation and words and pre-stored in a predetermined word library are / or the word is matched to analyze the company name and/or product name in the obtained social information; in another embodiment, after the word and/or word is segmented into the social information, the noun can be further obtained. Then, these nouns are matched with the nouns pre-stored in the predetermined noun library to analyze the company name and/or product name in the obtained social information. If the company name and/or product name is not obtained in the piece of social information, the social information is not processed, and the next social information is continuously analyzed for the company name and/or product name.
  • Step S3 when obtaining the company name and/or the product name in the social information, parsing the core viewpoint information corresponding to the social information according to a preset rule;
  • a piece of social information including a company name and/or a product name is parsed to obtain core view information in the piece of social information, and the core view information is a view on the company name and/or product name. Or opinion.
  • words and/or words of a predetermined part of speech in social information including a company name and/or a product name may be extracted, such as social information after segmentation of words and/or words.
  • the predetermined part of speech may be, for example, an adjective, a verb, a noun or a helper, and the like, and then analyzing the extracted words and/or words of the predetermined part of speech to obtain the core viewpoint information corresponding to the social information.
  • the core view information is "Zhonghong life product safety, high income”; in another embodiment, the word and / or word
  • the segmented social information is analyzed to analyze whether there are negative words and/or words, such as analyzing the social information after the word and/or word segmentation to determine whether the negative word and/or are included. Or word to obtain the core viewpoint information corresponding to the social information.
  • Step S4 The classifier that is generated by the pre-training is used to identify the information pointing category corresponding to the core viewpoint information, so as to send the social information corresponding to the preset information to the category and the social account that issues the social information to the predetermined terminal for review. .
  • the classifier generated by the pre-training is preferably a support vector machine classifier, and the information pointing category corresponding to the core viewpoint information includes positive information and negative information.
  • the information corresponding to the core viewpoint information is pointed to the category by the classifier, if the information corresponding to the core viewpoint information points to the category is negative information, the corresponding social information and the social account that issues the social information are sent to the predetermined terminal, Review this social information. If the audit is confirmed as negative information, the social account may be taken to control the release of the negative information, for example, sending a reminder message to the social account, reminding the user of the social account not to post negative information; or, for the social account The user sends a prompt message for the violation operation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Machine Translation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明涉及一种基于社交信息的风险事件的识别系统、方法、电子装置及存储介质,该系统包括:获取模块,用于从预定的社交服务器中获取预定的各社交账号发布的社交信息;分析模块,用于对社交信息进行分析,以获取公司名称和/或产品名称;解析模块,用于在获取得到社交信息中的公司名称和/或产品名称时,根据预设的规则解析得到社交信息对应的核心观点信息;识别模块,用于利用预先训练生成的分类器识别核心观点信息对应的信息指向类别,以便将属于预设的信息指向类别对应的社交信息及发布该社交信息的社交账号发送给终端进行审核。本发明能准确、有效地识别社交信息是否为负面信息,从而控制社交网络中的负面信息的发布,防止风险事件的发生。

Description

基于社交信息的风险事件的识别系统、方法、电子装置及存储介质
优先权申明
本申请基于巴黎公约申明享有2017年05月05日递交的申请号为CN201710313184.0、名称为“基于社交信息的风险事件的识别系统及方法”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本发明涉及金融技术领域,尤其涉及一种基于社交信息的风险事件的识别系统、方法、电子装置及计算机可读存储介质。
背景技术
随着移动互联网技术的不断发展,保险业务员或理财业务员等金融人员经常通过社交网络向客户推荐保险产品或理财产品,这样使得大量的金融舆情信息在社交网路中快速、大范围传播,其中有些保险业务员或理财业务员可能会实行一些违规行为,例如向客户宣传负面信息等;另外,有些客户在购买保险产品或理财产品后感觉受到不公正对待(实际可能是保险业务员的违规导致的),客户也会通过社交网络向其他潜在客户进行负面信息的宣泄,由此造成金融公司客户的流失等一系列的问题。
虽然目前存在着一些对网络信息进行识别的技术方案,但这些技术方案无法准确、有效地识别社交网络中传播的信息中的负面信息以进行控制,从而导致金融风险事件的发生。
发明内容
本发明的目的在于提供一种基于社交信息的风险事件的识别系统、方法、电子装置及计算机可读存储介质,旨在准确、有效地识别社交信息是否为负面信息,避免风险事件的发生。
为实现上述目的,本发明提供一种基于社交信息的风险事件的识别系统,所述基于社交信息的风险事件的识别系统包括:
获取模块,用于从预定的社交服务器中获取预定的各社交账号发布的社交信息;
分析模块,用于对所述社交信息进行分析,以获取所述社交信息中的公司名称和/或产品名称;
解析模块,用于在获取得到所述社交信息中的公司名称和/或产品名称时,根据预设的规则解析得到所述社交信息对应的核心观点信息;
识别模块,用于利用预先训练生成的分类器识别所述核心观点信息对应的信息指向类别,以便将属于预设的信息指向类别对应的社交信息及发布该社交信息的社交账号发送给预定的终端进行审核。
为实现上述目的,本发明还提供一种电子装置,所述电子装置包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的基于社交信息的风险事件的识别系统,所述基于社交信息的风险事件的识别系统被所述处理器执行时实现如下步骤:
S1,从预定的社交服务器中获取预定的各社交账号发布的社交信息;
S2,对所述社交信息进行分析,以获取所述社交信息中的公司名称和/或产品名称;
S3,在获取得到所述社交信息中的公司名称和/或产品名称时,根据预设的规则解析得到所述社交信息对应的核心观点信息;
S4,利用预先训练生成的分类器识别所述核心观点信息对应的信息指向类别,以便将属于预设的信息指向类别对应的社交信息及发布该社交信息的社交账号发送给预定的终端进行审核。
为实现上述目的,本发明还提供一种基于社交信息的风险事件的识别方法,所述基于社交信息的风险事件的识别方法包括:
S1,从预定的社交服务器中获取预定的各社交账号发布的社交信息;
S2,对所述社交信息进行分析,以获取所述社交信息中的公司名称和/或产品名称;
S3,在获取得到所述社交信息中的公司名称和/或产品名称时,根据预设的规则解析得到所述社交信息对应的核心观点信息;
S4,利用预先训练生成的分类器识别所述核心观点信息对应的信息指向类别,以便将属于预设的信息指向类别对应的社交信息及发布该社交信息的社交账号发送给预定的终端进行审核。
为实现上述目的,本发明还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有基于社交信息的风险事件的识别系统,所述基于社交信息的风险事件的识别系统被处理器执行时实现上述的基于社交信息的风险事件的识别方法的步骤
本发明的有益效果是:本发明从社交服务器中获取各社交账号发布的社交信息;对社交信息进行分析,以获取社交信息中的公司名称和/或产品名称;并解析得到包含公司名称和/或产品名称的社交信息对应的核心观点信息;最后利用分类器识别核心观点信息对应的信息指向类别,对于预设的信息指向类别的社交信息,可以将其发送给预定的终端进行审核,本发明通过对社交信息进行分析得到公司名称和/或产品名称,然后再解析得到该社交信息中的核心价值观点信息,能够准确、有效地识别社交信息的核心价值观点,以识别其是否为负面信息,从而控制社交网络中的负面信息的发布,防止风险事件的发生。
附图说明
图1为本发明电子装置一实施例的硬件结构示意图;
图2为本发明基于社交信息的风险事件的识别系统一实施例的结构示意 图;
图3为图2所示分析模块的结构示意图;
图4为图2所示解析模块的结构示意图;
图5为预设结构分词树的结构示意图;
图6为本发明基于社交信息的风险事件的识别方法一实施例的流程示意图。
具体实施方式
以下结合附图对本发明的原理和特征进行描述,所举实例只用于解释本发明,并非用于限定本发明的范围。
请参阅图1,图1是本发明电子装置较佳实施例的硬件结构示意图。
该电子装置1是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。所述电子装置1可以是计算机、也可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云,其中云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。
该电子装置1可包括,但不仅限于,存储器11、处理器12及显示器13。图1仅示出了具有组件11-13的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
存储器11在一些实施例中可以是电子装置1的内部存储单元,例如该电子装置1的硬盘或内存。存储器11在另一些实施例中也可以是电子装置1的外部存储设备,例如电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器11还可以既包括电子装置1的内部存储单元也包括外部存储设备。存储器11用于存储安装于电子装置1的应用软件及各类数据,例如基于社交信息的风险事件的识别系统的程序代码等。存储器11还可以用于暂时地存储已经输出或者将要输出的数据。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行基于社交信息的风险事件的识别系统等。
显示器13在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。显示器13用于显示在电子装置1中处理的信息以及用于显示可视化的用户界面,例如风险事件的识别界面等。电子装置1的部件11-13通过系统总线相互通信。
所述基于社交信息的风险事件的识别系统存储在存储器11中,包括至少一个存储在存储器11中的计算机可读指令,该至少一个计算机可读指令可被处理器器12执行,以实现本申请各实施例的方法;以及,该至少一个计算机可读指令依据其各部分所实现的功能不同,可被划为不同的逻辑模 块。
请参阅图2,是本发明基于社交信息的风险事件的识别系统一实施例的功能模块图。在本实施例中,基于社交信息的风险事件的识别系统可以被分割成一个或多个模块,一个或者多个模块被存储于存储器11中,并由一个或多个处理器(本实施例为处理器12)所执行,以完成本发明。例如,在图2中,基于社交信息的风险事件的识别系统可以被分割成获取模块101、分析模块102、解析模块103、识别模块104。本发明所称的模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述基于社交信息的风险事件的识别系统在电子装置1中的执行过程,其中:
获取模块101,用于从预定的社交服务器中获取预定的各社交账号发布的社交信息;
其中,预定的社交服务器例如为微博服务器、微信服务器或者QQ服务器等,社交账号与社交服务器对应,例如为微博账号、微信账号或者QQ账号等。对于某一社交服务器,预定的社交账号可以是该社交服务器的部分社交账号或者全部社交账号。用户在自己的社交账号上发布社交信息,例如可以是保险业务员A利用微信账号在朋友圈或某一朋友群中发布的社交信息,该社交信息例如为“平安推出了尊宏人生产品”。
本实施例中,基于社交信息的风险事件的识别系统可以实时地从社交服务器中获取预定的各社交账号发布的社交信息,以获取最新的社交信息,也可以定时地从社交服务器中获取预定的各社交账号发布的社交信息,相对于定时地获取社交信息的方式,能减轻系统负担。
分析模块102,用于对所述社交信息进行分析,以获取所述社交信息中的公司名称和/或产品名称;
本实施例中,对每一社交账号发布的社交信息进行分析,以获取所发布的社交信息中的公司名称和/或产品名称,例如对于上述的社交信息“平安推出了尊宏人生产品”,经过分析可以获取到其中的公司名称“平安”、产品名称“尊宏人生”,对于社交信息“今天去*景点游玩”,经分析未能获取到公司名称和/或产品名称。
对社交信息进行分析的过程中,可以按照发布的时间先后顺序逐条进行分析。对于某一条社交信息,在一实施例中,可以对其进行字和/或词的切分,然后将切分后的所有字和/或词与预先存储在预定的字词库中的字和/或词进行匹配,以分析获取得到社交信息中的公司名称和/或产品名称;在另一实施例中,在对社交信息进行字和/或词的切分后,可以进一步获取其中的名词,然后对这些名词与预先存储在预定的名词库中的名词进行匹配,以分析获取得到社交信息中的公司名称和/或产品名称。如果在该条社交信息中没有获取到公司名称和/或产品名称,则不对该条社交信息做任何处理,继续分析下一条社交信息是否有公司名称和/或产品名称。
通过分析一条社交信息中是否包含有公司名称和/或产品名称,进而可以分析该条社交信息中是否包含有针对该公司名称和/或产品名称的观点的信 息。
解析模块103,用于在获取得到所述社交信息中的公司名称和/或产品名称时,根据预设的规则解析得到所述社交信息对应的核心观点信息;
本实施例中,对于包含有公司名称和/或产品名称的一条社交信息进行解析,以获取该条社交信息中的核心观点信息,该核心观点信息为针对该公司名称和/或产品名称的看法或者观点。
在解析过程中,在一实施例中,可以提取包含有公司名称和/或产品名称的社交信息中预定词性的字和/或词,例如对进行字和/或词的切分后的社交信息进行预定词性的字和/或词的提取,预定的词性例如可以是形容词、动词、名词或助词等,然后分析提取的预定词性的字和/或词,以获取该社交信息对应的核心观点信息,例如对于社交信息“平安推出了尊宏人生产品,尊宏人生产品安全、收益高”,其中包含形容词“安全”、“高”,则核心观点信息为“尊宏人生产品安全、收益高”;在另一实施例中,对进行字和/或词的切分后的社交信息进行分析,分析其中是否有否定性的字和/或词,例如对进行字和/或词的切分后的社交信息进行分析,以确定是否包含否定性的字和/或词,以获取该社交信息对应的核心观点信息。
识别模块104,用于利用预先训练生成的分类器识别所述核心观点信息对应的信息指向类别,以便将属于预设的信息指向类别对应的社交信息及发布该社交信息的社交账号发送给预定的终端进行审核。
其中,预先训练生成的分类器优选为支持向量机分类器,核心观点信息对应的信息指向类别包括正面信息及负面信息,优选地,基于社交信息的风险事件的识别系统还包括用于训练生成支持向量机分类器的训练模块,用于:获取预设数量(例如10000个)的正面信息的核心观点信息样本(例如,样本为平安健康险保障范围广、平安车险大品牌理赔快)及预设数量的负面信息的核心观点信息样本(例如,样本为平安车险理赔慢服务差、平安理财产品没有承诺的高等);将获取的所有核心观点信息样本随机分成第一预设比例(例如70%)的训练集和第二预设比例(例如30%)的验证集,其中,训练集与验证集的比例之和小于等于1,利用所述训练集训练预定的支持向量机分类器(在第一次训练时,支持向量机分类器的参数可以采用默认的参数进行训练),并利用所述验证集验证训练后的支持向量机分类器的准确率;若所述准确率(例如该准确率为0.99)大于等于预设准确率(例如预设准确率例如为0.98),则训练结束,以训练后的支持向量机分类器为上述的识别模块104中的分类器,或者,若所述准确率(例如该准确率为0.95)小于预设准确率,则增加正面信息的核心观点信息样本数量及负面信息的核心观点信息样本数量,以重新进行训练。
在利用分类器识别出核心观点信息对应的信息指向类别后,如果核心观点信息对应的信息指向类别为负面信息,则将对应的社交信息及发布该社交信息的社交账号发送给预定的终端,以对该社交信息进行审核。若审核确认为负面信息则可以对该社交账号采取一些措施以控制负面信息的发布,例 如,向该社交账号发送提醒信息,提醒该社交账号的用户不要发布负面信息;或者,对该社交账号的用户发送违规操作的提示信息等。
与现有技术相比,本实施例从社交服务器中获取各社交账号发布的社交信息;对社交信息进行分析,以获取社交信息中的公司名称和/或产品名称;并解析得到包含公司名称和/或产品名称的社交信息对应的核心观点信息;最后利用分类器识别核心观点信息对应的信息指向类别,对于预设的信息指向类别(例如负面信息)的社交信息,可以将其发送给预定的终端进行审核,本实施例通过对社交信息进行分析得到公司名称和/或产品名称,然后再解析得到该社交信息中的核心价值观点信息,能够准确、有效地识别社交信息的核心价值观点,以识别其是否为负面信息,从而控制社交网络中的负面信息的发布,防止风险事件的发生。
在一优选的实施例中,如图3所示,在上述图2的实施例的基础上,上述的分析模块102包括:
分词单元1021,用于按照预定的分词规则对所述社交信息进行分词处理以获取对应的分词;其中,分词包括字和词,例如对于社交信息“平安推出了尊宏人生产品”,分词后的结果为“平安”、“推出”、“了”、“尊宏人生”、“产品”。
优选地,预定的分词规则为按预设类型标点符号对所述社交信息进行短句拆分,对拆分得到的短句,采用长词优先原则进行分词处理:例如按照标点符号“,”、“。”、“!”及“;”等对各社交信息进行短句拆分,从每一社交信息的起始处至第一个标点符号之间为一个短句,若社交信息的结束位置无标点符号,则从倒数第一个标点符号至社交信息结束位置之间为一个短句,且针对从第一个标点符号至倒数第一个预设类型标点符号之间,每两个标点符号之间的信息为一个短句;若社交信息结束位置有标点符号,则针对从第一个标点符号至倒数第一个标点符号之间,每两个标点符号之间为一个短句。
对拆分的每一个短句,采用长词优先原则继续进行分词,长词优先原则指的是:对于一个需要分词的短句T1,先从第一个字A开始,从预存的词库找出一个由A起始的最长词语X1,然后从T1中剔除X1剩下T2,再对T2采用相同的切分原理,切分后的结果为“X1/X2/、、、、、、”,例如,对于社交信息“平安推出了尊宏人生产品”,在预存的词库中包括“平安”、“推出”、“了”、“尊宏人生”和“产品”,则该社交信息的切分结果为“平安”、“推出”、“了”、“尊宏人生”、“产品”。
标注单元1022,用于按照预定的词性标注规则对所述分词进行词性标注;例如,词性标注可以为:“平安/名词”、“推出/动词”、“了/助词”、“尊宏人生/名词”、“产品/名词”。
优选地,预定的词性标注规则是:根据通用字词典库中字和词分别与词性的映射关系(例如,通用字词典库中,操场对应的词性是名词),及/或,根据预设的字和词分别与词性的映射关系(例如,预设的字和词分别与词性 的映射关系中,操场对应的词性是常用名词),确定分词处理后的各分词对应的词性,并进行标注。其中,可以单独根据通用字词典库中字和词分别与词性的映射关系进行词性标注,或者也可以单独根据预设的字和词分别与词性的映射关系进行词性标注,或者根据通用字词典库中字和词分别与词性的映射关系以及根据预设的字和词分别与词性的映射关系这两种方式综合进行词性标注(预设的字和词分别与词性的映射关系的词性标注的优先级高于通用字词典库中字和词分别与词性的映射关系的优先级,例如,若通用字词典库中,操场对应的词性是名词,且预设的字和词分别与词性的映射关系中,操场对应的词性是常用名词,则对操场标注的词性为常用名词)。
为各个分词标注对应的词性:例如,按照预存的助词词库识别出分词中的助词(例如助词“了”、“来”、“着”、“过”、“的”、“地”、“得”、“似的”、“所”等等),并对识别的助词进行助词词性标注;按照预存的形容词词库识别出分词中的形容词(例如“非常安全”、“保本型”、“收益高”、“周期长”等等),并对识别的形容词进行形容词词性标注;按照预存的动词词库识别出分词中的动词(例如“推”、“推出”、“发”、“发布”、“开发”、“销售”等等),并对识别的动词进行动词词性标注。
分类单元1023,用于对词性为名词(例如,人名、地名、公司名、产品名、其他名词)的分词,按照预定的词分类规则进行分类,以从分类结果中获取所述社交信息中的公司名称和/或产品名称;
优选地,预定的词分类规则为:采用预先训练生成的识别模型对词性标注为名词的分词进行名词类别识别,以将词性标注为名词的分词进行名词分类,优选地,识别模型为条件随机场模型(CRF)。
其中,条件随机场模型的训练过程包括:
1)、构造训练数据集:以预先确定的短句数据集格式(例如,{{company_name:平安}}推出了{{product_name:尊宏人生}}产品)构建预设数量的训练数据集;
2)、构造特征变量:针对每一个训练数据集,以分词为单位,对每一分词提取特征变量(例如,提取的特征变量包括但不限于:词性、上下文信息、词的结构等等),将非结构化数据转变为结构化的特征矩阵。以社交信息“平安推出了尊宏人生产品”为例,特征矩阵示例如下表1所示:
表1
分词 词性 前置词 后置词 包含“平安”
平安 名词 Null 推出 True
推出 动词 平安 False
助词 推出 尊宏人生 False
尊宏人生 名词 产品 False
产品 名词 尊宏人生 False
标点 产品 Null False
3)、训练模型:将构造的特征矩阵作为输入变量,训练条件随机场模型,并以训练后的条件随机场模型作为识别名词类别的模型,输出各种类别的名词,例如输出类别为人名的名词、输出类别为公司名的名词、输出类别为产品名的名词等等,最后从输出结果中获取名词的类别为公司名称和/或产品名称的名词。
在其他实施例中,在对分词进行词性标注后,可以获取预定的动词,例如动词“推”、“推出”、“发”、“发布”、“开发”或“销售”等,然后获取该动词后的名词作为一类别,再从该类别的名词中获取为公司名称和/或产品名称的名词。
在一优选的实施例中,如图4所示,在上述图3的实施例的基础上,上述解析模块103包括:
构建单元1031,用于在获取得到所述社交信息中的公司名称和/或产品名称时,根据获取得到公司名称和/或产品名称的社交信息中的各个分词的顺序及词性构建成预设结构分词树;
其中,如图5所示,预设结构分词树包括多级节点,第一级节点为所述社交信息,第二级节点为由所述社交信息按照对应的分词的顺序及词性划分得到的分词短语(例如名词短语、动词短语等等),第二级节点之后的每一级节点均是由上一级节点的分词短语按照词性继续划分得到的,直至划分至各节点分支的最后一级节点。在划分过程中,如果某一分词短语不能进一步划分,则该分词短语为所在的节点分支的最后一级节点,以“我去操场踢足球了”,构建的预设结构分词树如图5所示。
解析单元1032,用于基于所述预设结构分词树解析出对应的社交信息对应的核心观点信息。
其中,基于构建的预设结构分词树,计算第一个预设词性(例如名词)的分词与第二个预设词性(例如动词或者形容词)的分词的节点距离,即计算第一个预设词性的分词与第二个预设词性的分词之间相隔的节点数,找出与第一个预设词性的分词节点距离最小的第二个预设词性的分词,将第一个预设词性的分词与距离其最小的第二个预设词性的分词按照在该社交信息中的顺序组成对应的核心观点信息。
如图6所示,图6为本发明基于社交信息的风险事件的识别方法一实施例的结构示意图,其中,基于社交信息的风险事件的识别方法可由基于社交信息的风险事件的识别系统执行,该基于社交信息的风险事件的识别系统可以由软件和/或硬件实现,该基于社交信息的风险事件的识别系统可以集成在服务器中。该基于社交信息的风险事件的识别方法包括:
步骤S1,从预定的社交服务器中获取预定的各社交账号发布的社交信息;
其中,预定的社交服务器例如为微博服务器、微信服务器或者QQ服务器等,社交账号与社交服务器对应,例如为微博账号、微信账号或者QQ账号等。对于某一社交服务器,预定的社交账号可以是该社交服务器的部分社交账号或者全部社交账号。用户在自己的社交账号上发布社交信息,例如可以是保险业务员A利用微信账号在朋友圈或某一朋友群中发布的社交信息,该社交信息例如为“平安推出了尊宏人生产品”。
本实施例中,基于社交信息的风险事件的识别系统可以实时地从社交服务器中获取预定的各社交账号发布的社交信息,以获取最新的社交信息,也可以定时地从社交服务器中获取预定的各社交账号发布的社交信息,相对于定时地获取社交信息的方式,能减轻系统负担。
步骤S2,对所述社交信息进行分析,以获取所述社交信息中的公司名称和/或产品名称;
本实施例中,对每一社交账号发布的社交信息进行分析,以获取所发布的社交信息中的公司名称和/或产品名称,例如对于上述的社交信息“平安推出了尊宏人生产品”,经过分析可以获取到其中的公司名称“平安”、产品名称“尊宏人生”,对于社交信息“今天去*景点游玩”,经分析未能获取到公司名称和/或产品名称。
对社交信息进行分析的过程中,可以按照发布的时间先后顺序逐条进行分析。对于某一条社交信息,在一实施例中,可以对其进行字和/或词的切分,然后将切分后的所有字和/或词与预先存储在预定的字词库中的字和/或词进行匹配,以分析获取得到社交信息中的公司名称和/或产品名称;在另一实施例中,在对社交信息进行字和/或词的切分后,可以进一步获取其中的名词,然后对这些名词与预先存储在预定的名词库中的名词进行匹配,以分析获取得到社交信息中的公司名称和/或产品名称。如果在该条社交信息中没有获取到公司名称和/或产品名称,则不对该条社交信息做任何处理,继续分析下一条社交信息是否有公司名称和/或产品名称。
通过分析一条社交信息中是否包含有公司名称和/或产品名称,进而可以分析该条社交信息中是否包含有针对该公司名称和/或产品名称的观点的信息。
步骤S3,在获取得到所述社交信息中的公司名称和/或产品名称时,根据预设的规则解析得到所述社交信息对应的核心观点信息;
本实施例中,对于包含有公司名称和/或产品名称的一条社交信息进行解析,以获取该条社交信息中的核心观点信息,该核心观点信息为针对该公司名称和/或产品名称的看法或者观点。
在解析过程中,在一实施例中,可以提取包含有公司名称和/或产品名称的社交信息中预定词性的字和/或词,例如对进行字和/或词的切分后的社交信息进行预定词性的字和/或词的提取,预定的词性例如可以是形容词、动词、名词或助词等,然后分析提取的预定词性的字和/或词,以获取该社交信息对应的核心观点信息,例如对于社交信息“平安推出了尊宏人生产品,尊宏人 生产品安全、收益高”,其中包含形容词“安全”、“高”,则核心观点信息为“尊宏人生产品安全、收益高”;在另一实施例中,对进行字和/或词的切分后的社交信息进行分析,分析其中是否有否定性的字和/或词,例如对进行字和/或词的切分后的社交信息进行分析,以确定是否包含否定性的字和/或词,以获取该社交信息对应的核心观点信息。
步骤S4,利用预先训练生成的分类器识别所述核心观点信息对应的信息指向类别,以便将属于预设的信息指向类别对应的社交信息及发布该社交信息的社交账号发送给预定的终端进行审核。
其中,预先训练生成的分类器优选为支持向量机分类器,核心观点信息对应的信息指向类别包括正面信息及负面信息。在利用分类器识别出核心观点信息对应的信息指向类别后,如果核心观点信息对应的信息指向类别为负面信息,则将对应的社交信息及发布该社交信息的社交账号发送给预定的终端,以对该社交信息进行审核。若审核确认为负面信息则可以对该社交账号采取一些措施以控制负面信息的发布,例如,向该社交账号发送提醒信息,提醒该社交账号的用户不要发布负面信息;或者,对该社交账号的用户发送违规操作的提示信息等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (20)

  1. 一种基于社交信息的风险事件的识别系统,其特征在于,所述基于社交信息的风险事件的识别系统包括:
    获取模块,用于从预定的社交服务器中获取预定的各社交账号发布的社交信息;
    分析模块,用于对所述社交信息进行分析,以获取所述社交信息中的公司名称和/或产品名称;
    解析模块,用于在获取得到所述社交信息中的公司名称和/或产品名称时,根据预设的规则解析得到所述社交信息对应的核心观点信息;
    识别模块,用于利用预先训练生成的分类器识别所述核心观点信息对应的信息指向类别,以便将属于预设的信息指向类别对应的社交信息及发布该社交信息的社交账号发送给预定的终端进行审核。
  2. 一种电子装置,其特征在于,所述电子装置包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的基于社交信息的风险事件的识别系统,所述基于社交信息的风险事件的识别系统被所述处理器执行时实现如下步骤:
    S1,从预定的社交服务器中获取预定的各社交账号发布的社交信息;
    S2,对所述社交信息进行分析,以获取所述社交信息中的公司名称和/或产品名称;
    S3,在获取得到所述社交信息中的公司名称和/或产品名称时,根据预设的规则解析得到所述社交信息对应的核心观点信息;
    S4,利用预先训练生成的分类器识别所述核心观点信息对应的信息指向类别,以便将属于预设的信息指向类别对应的社交信息及发布该社交信息的社交账号发送给预定的终端进行审核。
  3. 根据权利要求2所述的电子装置,其特征在于,所述信息指向类别包括正面信息和负面信息,所述分类器为支持向量机分类器,所述基于社交信息的风险事件的识别系统被所述处理器执行时,还实现如下步骤:
    获取预设数量的正面信息的核心观点信息样本及预设数量的负面信息的核心观点信息样本,将获取的所有核心观点信息样本随机分成第一预设比例的训练集和第二预设比例的验证集,利用所述训练集训练预定的支持向量机分类器,并利用所述验证集验证训练后的支持向量机分类器的准确率,若所述准确率大于等于预设准确率,则训练结束,以训练后的支持向量机分类器为所述分类器,或者,若所述准确率小于预设准确率,则增加正面信息的核心观点信息样本数量及负面信息的核心观点信息样本数量,以重新进行训练。
  4. 根据权利要求2或3所述的电子装置,其特征在于,所述步骤S2包括:
    按照预定的分词规则对所述社交信息进行分词处理以获取对应的分词;
    按照预定的词性标注规则对所述分词进行词性标注;
    对词性为名词的分词,按照预定的词分类规则进行分类,以从分类结果中获取所述社交信息中的公司名称和/或产品名称。
  5. 根据权利要求4所述的电子装置,其特征在于,所述预定的分词规则为:
    按预设类型标点符号对所述社交信息进行短句拆分,对拆分得到的短句,采用长词优先原则进行分词处理。
  6. 根据权利要求5所述的电子装置,其特征在于,所述预先确定的词性标注规则为:
    根据通用字词典库中字和词分别与词性的映射关系,及/或,根据预设的字和词分别与词性的映射关系,确定分词处理后的各分词对应的词性,并进行标注。
  7. 根据权利要求6所述的电子装置,其特征在于,所述预定的词分类规则为:
    采用预先训练生成的识别模型对词性标注为名词的分词进行名词类别识别,以将词性标注为名词的分词进行名词分类,所述识别模型为条件随机场模型。
  8. 根据权利要求5所述的电子装置,其特征在于,所述步骤S3包括:
    在获取得到所述社交信息中的公司名称和/或产品名称时,根据获取得到公司名称和/或产品名称的社交信息中的各个分词的顺序及词性构建成预设结构分词树;
    基于所述预设结构分词树解析出对应的社交信息对应的核心观点信息。
  9. 根据权利要求8所述的电子装置,其特征在于,所述预设结构分词树包括多级节点,第一级节点为所述社交信息,第二级节点为由所述社交信息按照对应的分词的顺序及词性划分得到的分词短语,第二级节点之后的每一级节点均是由上一级节点的分词短语按照词性划分得到。
  10. 根据权利要求9所述的电子装置,其特征在于,所述步骤S3进一步包括:基于所述预设结构分词树计算第一预设词性的分词与第二预设词性的分词的节点距离;获取与第一个预设词性的分词节点距离最小的第二个预设词性的分词,将第一个预设词性的分词与距离其最小的第二个预设词性的分词按序组成对应的核心观点信息。
  11. 一种基于社交信息的风险事件的识别方法,其特征在于,所述基于社交信息的风险事件的识别方法包括:
    S1,从预定的社交服务器中获取预定的各社交账号发布的社交信息;
    S2,对所述社交信息进行分析,以获取所述社交信息中的公司名称和/或产品名称;
    S3,在获取得到所述社交信息中的公司名称和/或产品名称时,根据预设的规则解析得到所述社交信息对应的核心观点信息;
    S4,利用预先训练生成的分类器识别所述核心观点信息对应的信息指向 类别,以便将属于预设的信息指向类别对应的社交信息及发布该社交信息的社交账号发送给预定的终端进行审核。
  12. 根据权利要求11所述的基于社交信息的风险事件的识别方法,其特征在于,所述信息指向类别包括正面信息和负面信息,所述分类器为支持向量机分类器,所述基于社交信息的风险事件的识别方法还包括:
    获取预设数量的正面信息的核心观点信息样本及预设数量的负面信息的核心观点信息样本,将获取的所有核心观点信息样本随机分成第一预设比例的训练集和第二预设比例的验证集,利用所述训练集训练预定的支持向量机分类器,并利用所述验证集验证训练后的支持向量机分类器的准确率,若所述准确率大于等于预设准确率,则训练结束,以训练后的支持向量机分类器为所述分类器,或者,若所述准确率小于预设准确率,则增加正面信息的核心观点信息样本数量及负面信息的核心观点信息样本数量,以重新进行训练。
  13. 根据权利要求11或12所述的基于社交信息的风险事件的识别方法,其特征在于,所述步骤S2包括:
    按照预定的分词规则对所述社交信息进行分词处理以获取对应的分词;
    按照预定的词性标注规则对所述分词进行词性标注;
    对词性为名词的分词,按照预定的词分类规则进行分类,以从分类结果中获取所述社交信息中的公司名称和/或产品名称。
  14. 根据权利要求13所述的基于社交信息的风险事件的识别方法,其特征在于,所述预定的分词规则为:
    按预设类型标点符号对所述社交信息进行短句拆分,对拆分得到的短句,采用长词优先原则进行分词处理。
  15. 根据权利要求14所述的基于社交信息的风险事件的识别方法,其特征在于,所述预先确定的词性标注规则为:
    根据通用字词典库中字和词分别与词性的映射关系,及/或,根据预设的字和词分别与词性的映射关系,确定分词处理后的各分词对应的词性,并进行标注。
  16. 根据权利要求15所述的基于社交信息的风险事件的识别方法,其特征在于,所述预定的词分类规则为:
    采用预先训练生成的识别模型对词性标注为名词的分词进行名词类别识别,以将词性标注为名词的分词进行名词分类,所述识别模型为条件随机场模型。
  17. 根据权利要求14所述的基于社交信息的风险事件的识别方法,其特征在于,所述步骤S3包括:
    在获取得到所述社交信息中的公司名称和/或产品名称时,根据获取得到公司名称和/或产品名称的社交信息中的各个分词的顺序及词性构建成预设结构分词树;
    基于所述预设结构分词树解析出对应的社交信息对应的核心观点信息。
  18. 根据权利要求17所述的基于社交信息的风险事件的识别方法,其特征在于,所述预设结构分词树包括多级节点,第一级节点为所述社交信息,第二级节点为由所述社交信息按照对应的分词的顺序及词性划分得到的分词短语,第二级节点之后的每一级节点均是由上一级节点的分词短语按照词性划分得到。
  19. 根据权利要求18所述的基于社交信息的风险事件的识别方法,其特征在于,所述步骤S3进一步包括:基于所述预设结构分词树计算第一预设词性的分词与第二预设词性的分词的节点距离;获取与第一个预设词性的分词节点距离最小的第二个预设词性的分词,将第一个预设词性的分词与距离其最小的第二个预设词性的分词按序组成对应的核心观点信息。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有基于社交信息的风险事件的识别系统,所述基于社交信息的风险事件的识别系统被处理器执行时实现上述权利要求11-19任一项所述的基于社交信息的风险事件的识别方法步骤。
PCT/CN2017/091358 2017-05-05 2017-06-30 基于社交信息的风险事件的识别系统、方法、电子装置及存储介质 WO2018201599A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
KR1020187017275A KR20190022430A (ko) 2017-05-05 2017-06-30 소셜 정보 기반의 리스크 이벤트의 식별 시스템, 방법, 전자장치 및 저장매체
US16/084,235 US11803796B2 (en) 2017-05-05 2017-06-30 System, method, electronic device, and storage medium for identifying risk event based on social information
EP17897215.4A EP3425531A4 (en) 2017-05-05 2017-06-30 SYSTEM, METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM FOR IDENTIFYING A RISK EVENT BASED ON SOCIAL INFORMATION
JP2018530794A JP6608061B2 (ja) 2017-05-05 2017-06-30 Sns情報に基づくリスクイベント認識システム、方法、電子装置及び記憶媒体
SG11201901072SA SG11201901072SA (en) 2017-05-05 2017-06-30 System, method, electronic device, and storage medium for identifying risk event based on social information
AU2017404560A AU2017404560A1 (en) 2017-05-05 2017-06-30 System, method, electronic device, and storage medium for identifying risk event based on social information

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710313184.0A CN107688594B (zh) 2017-05-05 2017-05-05 基于社交信息的风险事件的识别系统及方法
CN201710313184.0 2017-05-05

Publications (1)

Publication Number Publication Date
WO2018201599A1 true WO2018201599A1 (zh) 2018-11-08

Family

ID=61152473

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/091358 WO2018201599A1 (zh) 2017-05-05 2017-06-30 基于社交信息的风险事件的识别系统、方法、电子装置及存储介质

Country Status (8)

Country Link
US (1) US11803796B2 (zh)
EP (1) EP3425531A4 (zh)
JP (1) JP6608061B2 (zh)
KR (1) KR20190022430A (zh)
CN (1) CN107688594B (zh)
AU (1) AU2017404560A1 (zh)
SG (1) SG11201901072SA (zh)
WO (1) WO2018201599A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135693A (zh) * 2019-04-12 2019-08-16 北京中科闻歌科技股份有限公司 一种风险识别方法、装置、设备及存储介质
CN110377809A (zh) * 2019-06-19 2019-10-25 深圳壹账通智能科技有限公司 预设用户的资源获取资质生成方法及相关设备
CN110287493B (zh) * 2019-06-28 2023-04-18 中国科学技术信息研究所 风险短语识别方法、装置、电子设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138640A (zh) * 2015-08-24 2015-12-09 成都秋雷科技有限责任公司 基于云的网页广告筛选方法
CN105141607A (zh) * 2015-08-24 2015-12-09 成都秋雷科技有限责任公司 基于云的恶意链接拦截方法
CN105183793A (zh) * 2015-08-24 2015-12-23 成都秋雷科技有限责任公司 网页弹窗快速拦截方法

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008084064A (ja) * 2006-09-28 2008-04-10 National Institute Of Advanced Industrial & Technology テキスト分類処理方法、テキスト分類処理装置ならびにテキスト分類処理プログラム
CN101201819B (zh) * 2007-11-28 2010-12-08 北京金山软件有限公司 一种树库转化方法及树库转化系统
CN101266520B (zh) * 2008-04-18 2013-03-27 上海触乐信息科技有限公司 一种可实现灵活键盘布局的系统
CN101329666A (zh) * 2008-06-18 2008-12-24 南京大学 基于语料库及树型结构模式匹配的汉语句法自动分析方法
US20120221485A1 (en) * 2009-12-01 2012-08-30 Leidner Jochen L Methods and systems for risk mining and for generating entity risk profiles
JP5286317B2 (ja) * 2010-03-26 2013-09-11 株式会社野村総合研究所 リスク情報提供システム及びプログラム
US9672283B2 (en) * 2012-06-06 2017-06-06 Data Record Science Structured and social data aggregator
KR101409413B1 (ko) 2012-07-20 2014-06-20 한양대학교 에리카산학협력단 단일화 문법을 이용한 자연어 처리 방법
US9213760B2 (en) * 2012-11-27 2015-12-15 Linkedin Corporation Unified social content platform
GB201308541D0 (en) 2013-05-13 2013-06-19 Qatar Foundation Social media news portal
JP5633944B1 (ja) * 2013-06-02 2014-12-03 データ・サイエンティスト株式会社 評価方法、評価装置、およびプログラム
JP6071792B2 (ja) * 2013-07-31 2017-02-01 株式会社東芝 社会情報提供システムおよび社会情報配信装置
CN103593431A (zh) * 2013-11-11 2014-02-19 北京锐安科技有限公司 网络舆情分析方法和装置
CN104809109B (zh) * 2014-01-23 2019-12-10 腾讯科技(深圳)有限公司 一种社交信息展示方法、装置及服务器
US9582486B2 (en) 2014-05-13 2017-02-28 Lc Cns Co., Ltd. Apparatus and method for classifying and analyzing documents including text
KR101561464B1 (ko) 2014-08-25 2015-10-20 성균관대학교산학협력단 수집 데이터 감성분석 방법 및 장치
JP6392042B2 (ja) * 2014-09-11 2018-09-19 Kddi株式会社 情報提供装置、情報を提供する方法およびプログラム
JP5972425B1 (ja) 2015-05-08 2016-08-17 株式会社エルプランニング 風評被害リスクレポート作成システム、プログラム及び方法
JP2017004127A (ja) 2015-06-05 2017-01-05 富士通株式会社 テキスト分割プログラム、テキスト分割装置、及びテキスト分割方法
CN107545505B (zh) * 2016-06-24 2020-09-29 深圳壹账通智能科技有限公司 保险理财产品信息的识别方法及系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138640A (zh) * 2015-08-24 2015-12-09 成都秋雷科技有限责任公司 基于云的网页广告筛选方法
CN105141607A (zh) * 2015-08-24 2015-12-09 成都秋雷科技有限责任公司 基于云的恶意链接拦截方法
CN105183793A (zh) * 2015-08-24 2015-12-23 成都秋雷科技有限责任公司 网页弹窗快速拦截方法

Also Published As

Publication number Publication date
EP3425531A4 (en) 2020-04-22
AU2017404560A1 (en) 2018-11-22
US20230186212A1 (en) 2023-06-15
CN107688594B (zh) 2019-07-16
JP2019520614A (ja) 2019-07-18
CN107688594A (zh) 2018-02-13
EP3425531A1 (en) 2019-01-09
US11803796B2 (en) 2023-10-31
KR20190022430A (ko) 2019-03-06
SG11201901072SA (en) 2019-03-28
JP6608061B2 (ja) 2019-11-20

Similar Documents

Publication Publication Date Title
CN110263248B (zh) 一种信息推送方法、装置、存储介质和服务器
CN108519970B (zh) 文本中敏感信息的鉴定方法、电子装置及可读存储介质
WO2019227710A1 (zh) 网络舆情的分析方法、装置及计算机可读存储介质
JP5936698B2 (ja) 単語意味関係抽出装置
US20190005019A1 (en) Contextual pharmacovigilance system
WO2019037258A1 (zh) 信息推荐的装置、方法、系统及计算机可读存储介质
US20200074242A1 (en) System and method for monitoring online retail platform using artificial intelligence
WO2018201772A1 (zh) 医疗文本的潜在疾病推断方法、系统及可读存储介质
WO2020259280A1 (zh) 日志管理方法、装置、网络设备和可读存储介质
CN112686036B (zh) 风险文本识别方法、装置、计算机设备及存储介质
CN113095076A (zh) 敏感词识别方法、装置、电子设备及存储介质
CN104850617A (zh) 短文本处理方法及装置
WO2018201599A1 (zh) 基于社交信息的风险事件的识别系统、方法、电子装置及存储介质
CN107545505B (zh) 保险理财产品信息的识别方法及系统
JP2021093163A (ja) ディープラーニングに基づく文書類似度測定モデルを利用した重複文書探知方法およびシステム
WO2021012958A1 (zh) 原创文本甄别方法、装置、设备与计算机可读存储介质
CN109753646B (zh) 一种文章属性识别方法以及电子设备
Mehta et al. Sentiment analysis on product reviews using Hadoop
CN113420545B (zh) 摘要生成方法、装置、设备及存储介质
WO2018205460A1 (zh) 获取目标用户的方法、装置、电子设备及介质
CN113420143A (zh) 文书摘要生成方法、装置、设备及存储介质
CN114065763A (zh) 一种基于事件抽取的舆情分析方法、装置及相关组件
TWI712948B (zh) 文本情緒分析的方法,裝置與電腦程式產品
Wan et al. Data mining technology application in false text information recognition
Mamatha et al. Supervised aspect category detection of co-occurrence data using conditional random fields

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018530794

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20187017275

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2017897215

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2017897215

Country of ref document: EP

Effective date: 20180829

ENP Entry into the national phase

Ref document number: 2017404560

Country of ref document: AU

Date of ref document: 20170630

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE