WO2019153589A1 - 消息数据处理方法、装置、计算机设备和存储介质 - Google Patents

消息数据处理方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2019153589A1
WO2019153589A1 PCT/CN2018/089068 CN2018089068W WO2019153589A1 WO 2019153589 A1 WO2019153589 A1 WO 2019153589A1 CN 2018089068 W CN2018089068 W CN 2018089068W WO 2019153589 A1 WO2019153589 A1 WO 2019153589A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
word segmentation
target
preset
target word
Prior art date
Application number
PCT/CN2018/089068
Other languages
English (en)
French (fr)
Inventor
张澍滋
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019153589A1 publication Critical patent/WO2019153589A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles

Definitions

  • the present application relates to a message data processing method, apparatus, computer device and storage medium.
  • a message data processing method, apparatus, computer device, and storage medium are provided.
  • a message data processing method includes:
  • a message data processing apparatus includes:
  • a receiving module configured to receive, in a main thread, message data sent by a server, where the message data carries source data
  • a splitting module configured to split the message data according to word segmentation logic to obtain target word segmentation data
  • An associative storage module configured to store the target segmentation data and the source data corresponding to the target segmentation data in association, where source data corresponding to the target segmentation data is the same as source data carried on the message data corresponding to the target segmentation data ;
  • a query module configured to query, in the identification thread, whether the target word segmentation data has a target word segmentation data whose risk level is greater than a preset level
  • the obtaining module is configured to obtain source data corresponding to the target word segmentation data whose risk level is greater than a preset level, and add a risk tag to the acquired source data.
  • a computer device comprising a memory and one or more processors having stored therein computer readable instructions, the computer readable instructions being executable by the processor to cause the one or more processors to execute The following steps:
  • One or more non-transitory computer readable storage mediums storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the steps of:
  • FIG. 1 is an application scenario diagram of a message data processing method in accordance with one or more embodiments.
  • FIG. 2 is a flow diagram of a message data processing method in accordance with one or more embodiments.
  • FIG. 3 is a block diagram of a message data processing apparatus in accordance with one or more embodiments.
  • FIG. 4 is a block diagram of a computer device in accordance with one or more embodiments.
  • Terminal 102 communicates with server 104 over a network over a network.
  • the terminal 102 receives the message data sent by the server 104 in the main thread, and the message data carries the source data, and the terminal 102 splits the received message data according to the word segmentation logic to obtain the target word segmentation data, and the terminal 102 splits the obtained message data.
  • the target word segmentation data is associated with the source data corresponding to the target word segmentation data, and the terminal 102 starts the recognition thread, and queries the target segmentation data in the recognition thread whether there is target segmentation word data whose risk level is greater than a preset level, and then when the query exists When the risk level is greater than the target word segmentation data of the preset level, the source data corresponding to the target word segmentation data is obtained, and the source data corresponding to the target segmentation data is added to the risk tag.
  • the terminal 102 can be, but is not limited to, various personal computers, notebook computers, smart phones, and tablets.
  • the server 104 can be implemented by a separate server 104 or a cluster of servers 104 composed of a plurality of servers 104.
  • a message data processing method is provided, which is applied to the terminal in FIG. 1 as an example, and includes the following steps:
  • S202 Receive, in the main thread, message data sent by the server, where the message data carries the source data.
  • the main thread refers to a thread that can be executed when the terminal works, and the terminal can execute a waiting task in the main thread, and the receiving task can be executed in the main thread terminal, and the terminal can execute the processing message in the main thread.
  • the task of the data for example, in the main thread, the terminal can perform the task of receiving the message data, and the terminal can perform the task of splitting the message data in the main thread.
  • the message data refers to chat data that is exchanged between different identity information stored in the server.
  • the message data may be text data, may be picture data, or may be digital data.
  • the message data may be stored between different accounts on the server. Chat history, etc.
  • the source data refers to the identifier of the corresponding message data source, and the source data may be text data, which may be text data, and may be image data, for example, the source data is the account information of the sender of the message data, and the sending time of the sending message data, Basic information of the source group of the message data, etc. Further, the current main thread is started, and the message data sent by the server is received in the main thread, and the message data carries the source corresponding to the message data.
  • the method may be: sending a request for obtaining a chat message to a chat sending interface corresponding to the server, receiving an identity verification request sent by the server, and then sending the identity verification information to the server according to the identity verification request sent by the server, and verifying the identity verification information by using the server.
  • the server may perform data transmission, thereby receiving message data sent by the server, the message data may be chat data corresponding to the identity verification information, and the received message data carries source data.
  • the terminal sends an acquisition request for obtaining the chat information to the chat sending interface corresponding to the server, and the terminal receives the identity verification request sent by the server according to the acquisition request of the chat message, and then the terminal sends the identity verification information corresponding to the identity verification request to the server, for example, corresponding User name and login password.
  • the terminal can perform data transmission with the server, and then receive the chat data sent by the server, and the chat data can carry the account of the sender who sends each piece of chat data.
  • the time sent by the sender when it is a group chat, carries the group basic information such as the group name or the group number corresponding to the group.
  • S204 Split the message data according to the word segmentation logic to obtain target word segmentation data.
  • word segmentation logic is to split the message data into several standard terms.
  • the standard term refers to a term with independent semantics, which is not affected by the text after or before it. Only the text of the term content can be used to determine the completeness.
  • the computer professional concept for example, the message data "Ping An Bank Activity” is based on making each split data have independent semantics, and each split data is split for the shortest, resulting in "Ping An Bank” and "Activity” Split data.
  • Target word segmentation data refers to terms that have been separated and obtained with independent semantics. Further, when the message data sent by the server is received, the corresponding word segmentation logic is acquired, and then the message data is split by the word segmentation logic to obtain the target word segmentation data.
  • the terminal when the message data is text data, when the terminal receives the message data sent by the server, the terminal acquires the corresponding word segmentation logic, and then matches the characters in the message data one by one with the word segmentation logic, and matches the successfully succeeded characters as the target word segmentation. data. For example, when the terminal receives the message data sent by the server as "Ping An Bank today", and then obtains the word segmentation logic, the message data is split into three target word segmentation data: "Ping An Bank", "Today", and "Activity".
  • the target segmentation data is associated with the source data corresponding to the target segmentation data, and the source data corresponding to the target segmentation data is the same as the source data carried on the target segmentation data corresponding to the target segmentation data.
  • the target word segmentation data obtained by the splitting is stored, and at the same time, the source data corresponding to the target word segment data is simultaneously stored with the target word segment data, and the source data corresponding to the target segmentation data is the message data of the target word segmentation data obtained by splitting.
  • the terminal may store the source data corresponding to the target word segment data and the target word segment data in the database at the same time, and the source data corresponding to the target word segment data may be the account information of the sender carried by the message data, the sending time of the sending message data, and the message data. Basic information about the source group, etc.
  • the terminal will split the obtained message data “Ping An Bank today's activity” to obtain the target word segmentation data as “Ping An Bank”, “Today” and “Activity”, and then store the three target word segmentation data in the database, and the target participle
  • the source data corresponding to the data is the account number of the sender of each piece of chat data corresponding to the piece of chat data, and the time sent by the sender.
  • the group chat is at that time, the group name or the group number corresponding to the group is carried. information.
  • S208 Query the target word segmentation data in the identification thread whether there is target segmentation word data whose risk level is greater than a preset level.
  • the identifying thread refers to another thread that can be executed when the terminal is working, and the identifying thread is in an asynchronous relationship with the main thread, that is, the identifying thread refers to a working thread that is asynchronous with the main thread, for example, in identifying
  • the thread can perform the identification of the target word segmentation data in the target word segmentation data with the risk level greater than the preset level, and can continue to accept the message data sent by the server in the main thread, thereby splitting the message data according to the word segmentation logic to obtain the target word segmentation data.
  • the target word segmentation data whose risk level is greater than the preset level refers to the target word segmentation data with high risk level.
  • the target word segmentation data is suspicious target segmentation data, and then the suspicious target segmentation is obtained by splitting.
  • the message data of the data is suspicious message data.
  • the terminal pre- The data with the risk level greater than the preset level is matched, and the stored target word segmentation data is matched with the preset risk level greater than the preset level data, thereby querying whether the target word segmentation data has a target word segment whose risk level is greater than the preset level.
  • the target word segmentation data when the target word segmentation data has target segmentation word data with a risk level greater than a preset level, the target segmentation word data is suspicious target word segmentation data, and the message data is suspicious message data, and the source data corresponding to the suspicious message data is High-risk source data.
  • the target word segmentation data with the risk level greater than the preset level is set to “Ping An Bank”, and the target segmentation data “Ping An Bank”, “Today” and “Activity” that the terminal will obtain in the main thread, and corresponding to the target word segmentation data.
  • the source data is stored in association, and the terminal queries the created identification thread that there is no identification task, and in the identification thread, whether the stored "Ping An Bank", "Today” and "Activity" are queried for risk level If the target word segmentation data is greater than the preset level, when the “Ping An Bank” is queried, the three target word segmentation data obtained have the target word segmentation data with the risk level greater than the preset level as “Ping An Bank”, and then the target word segmentation data is obtained.
  • the message data is message data.
  • S210 Obtain source data corresponding to the target word segmentation data whose ambiguous risk level is greater than the preset level, and add a risk tag to the obtained source data.
  • the risk tag refers to the corresponding risk prompt information
  • the risk tag is a corresponding risk identifier.
  • the source data is source data with a high risk level, and needs The source data of the risk tag is further monitored, and the message data corresponding to the source data can be further monitored. Further, the source data corresponding to the target segmentation data of the preset level is obtained, and the risk data is added to the obtained source data.
  • the terminal obtains the target word segmentation data “Ping An Bank”, “Today” and “Activity” of the split to obtain the target word segmentation data “Ping An Bank” whose risk level is greater than the preset level, and then obtains the “Ping An Bank” corresponding to the target word segmentation data “Ping An Bank”, “Today” and “Activity”.
  • the source data if it is a group chat, obtain the group basic information such as the group name or the group number, and then add the group basic information to the risk label, and further monitor the group to which the risk label is added, thereby further monitoring the source data. Other message data corresponding to the group.
  • different tasks can be executed in different threads, and the message data sent by the server is received in the main thread, and then the message data is split according to the word segmentation logic to obtain the target word segmentation data, and the obtained target word segmentation is obtained.
  • the data is stored with the corresponding source data, and the target word segmentation data whose risk level is greater than the preset level is queried from the target word segmentation data in the recognition thread, and a large amount of message data can be processed in different threads without manual monitoring and analysis, thereby improving processing.
  • the efficiency, and then the risk data corresponding to the target word segmentation data with the risk level greater than the preset level is added, and the risk tag can be further monitored to improve the accuracy rate.
  • the step of splitting the message data according to the preset word segmentation logic to obtain the target word segmentation data may include: acquiring a preset plurality of word segmentation logic, and disassembling the message data according to the preset plurality of word segmentation logics.
  • the segmentation sequence is obtained; the resolution of the segmentation corresponding to each segmentation sequence is calculated; and the segmentation sequence corresponding to the maximum resolution of the segmentation is obtained as the target segmentation data.
  • the word segmentation sequence refers to different candidate fields obtained by splitting the message data, that is, different standard terms are obtained, for example, the obtained message data is “Ping An Bank today activity”, and different word segmentation sequences can be obtained. There is a sequence of first word segmentation "Ping An”, “Bank”, “Today” and “Activity”. The second participle sequence obtained is “Ping An Bank”, “Today” and “Activity”.
  • the split correct rate refers to the correctness of the candidate fields obtained by splitting the message data, and the correctness of the different standard terms obtained by splitting the message data. When the split correct rate is higher, the message data is represented. The candidate field obtained by splitting has a higher correct rate.
  • each message data is obtained with a corresponding different word segment sequence, and then calculating each message.
  • the segmentation correctness rate corresponding to each word segment sequence is used as the target word segmentation data for the segmentation phrase in the segmentation sequence with the highest resolution.
  • the calculation of the split correctness rate may be: when different sequence of word segments is acquired, the correct rate of different word segmentation phrases in each of the preset word segmentation sequences is obtained, and then the correct rate product of different word segmentation phrases is calculated, thereby obtaining The correct resolution of the split corresponding to each word segmentation sequence,
  • the terminal receives the message data sent by the server as “Ping An Bank Today Activity”, and then acquires different word segmentation logics, and according to different word segmentation logic, the obtained message data is split into different word segment sequences, which can be split into A sequence of words "Peace”, “Bank”, “Today”, “Activity”, the second participle sequence of the split is "Ping An Bank”, “Today”, “Activity”, and then the correct rate of the different participle phrases in the first participle sequence is obtained. If the correct rate of obtaining "Peace” is 0.8, the correct rate of "Bank” is “0.8”, the correct rate of "Today” is 1, and the correct rate of "Activity” is 1, and the difference in the second participle sequence is obtained.
  • the correct rate of word segmentation such as the correct rate of obtaining "Ping An Bank", the correct rate of "Today” is 1, the correct rate of "Activity” is 1, and then the correct rate of the word segmentation in the first sequence is calculated. If the product is 0.64, the first segmentation sequence corresponds to a resolution of 0.64, and the product of the correctness of the segmentation phrase in the second sequence is calculated as 1, and the second segmentation sequence is The splitting accuracy rate is 1, and the second segmentation sequence has the highest splitting accuracy rate, and the target words "Ping An Bank", "Today” and "Activity” in the second word segmentation sequence are targeted. Word segmentation data. It should be noted that the probability of different word segmentation phrases is a preset probability, and the preset probability may be preset to different values.
  • the terminal may split the message data into multiple word segmentation sequences according to multiple word segmentation logics, and then select the word segmentation sequence with the highest resolution rate as the target word segmentation data, thereby ensuring accurate target word segmentation data, thereby improving The accuracy of the identification.
  • the step of splitting the message data according to the word segmentation logic to obtain the target word segmentation data may include: splitting the message data according to the word segmentation logic to obtain the initial word segmentation data;
  • the initial word segmentation data refers to a term with independent semantics to be verified according to the segmentation logic.
  • the basic filter library refers to storage.
  • a database with simple characters A single word or phrase can be stored in the underlying database. For example, "Peace”, “Bank”, “Week”, etc. are stored, and the simple characters stored in the underlying filter rate may be incorrectly split.
  • the simple phrase which in turn requires further identification of the correctness of the initial word segmentation data obtained by the validation;
  • the filtered data refers to simple characters stored in the underlying filter library, such as individual words or phrases; time data refers to stored in the source data.
  • the receiving time of the message data is received.
  • the time data may be a specific year, month, day, and time.
  • the time data is 13:30 on January 1, 2018.
  • the message data is split according to the word segmentation logic to obtain the initial word segmentation data, and then the initial word segmentation data is matched with the filter data stored in the basic filter library one by one, and at least two data and the basic filter library in the initial segmentation data are used. If the filtered data stored in the match is successful, the initial word segmentation data may be an incorrect split result corresponding to the message data, and the split result needs to be further confirmed, and the source data corresponding to the initial segmentation data successfully matched is extracted.
  • the initial word segmentation data is incorrect, and cannot be used as the target word segmentation data, thereby obtaining the message data corresponding to the time data, and displaying the corresponding prompt information on the display interface according to the obtained message data, and then the user selects according to the prompt information, and when the selection is added New word segmentation logic, then root
  • the selection generates a corresponding adding instruction.
  • the word segmentation logic corresponding to the message data is added according to the adding instruction, and the message data corresponding to the initial word segmentation data is newly split by using the new word segmentation logic to obtain the target word segmentation. data.
  • the initial data segmentation data obtained by the terminal splitting the message data "Ping An Bank today's activity” according to the word segmentation logic is “Ping", “Bank”, “Today”, “Activity”, and then the filtering data stored in the terminal and the basic filter library.
  • Matching one by one when the initial word segmentation data "Ping” is successfully matched with the initial segmentation data "Bank” and the filtered data stored in the base filter library, the message data may be incorrectly split, that is, the The message data is split into high-level phrases, and the terminal extracts the time data in the source data corresponding to each of the matching initial word segment data, and the source data is the source data corresponding to the message data, such as the extracted initial participle of the matching success.
  • the time data corresponding to the data is 14 o'clock on January 1, 2018, and the initial segmentation data of the matching success is derived from the same message data, and the current pre-stored word segmentation logic used in the message data is "peaceful".
  • the initial word segmentation data of “Bank” “Today” “Activity” is incorrect, that is, the message data needs to be It is re-split to obtain the corresponding advanced phrase as the target word segmentation data, and then the terminal obtains the message data corresponding to the January 14th, 2018 "Ping An Bank today", according to the message data to be obtained, on the display interface of the terminal.
  • the prompt information may be “whether or not to add a new word breaker logic”, and then the user selects “Yes”, the corresponding add instruction is generated, and when the terminal receives the add instruction, the terminal according to the add instruction
  • the word segmentation logic corresponding to the message data is added, so that the “Pingan Bank Today Activity” is split by the newly added word segmentation logic to obtain the target word segmentation data, and the target word segmentation data obtained by the new segmentation logic segmentation may be “Ping An Bank”. "Today” "Activity”.
  • the initial segmentation data is matched with the stored data in the basic filter library, and time data in the source data corresponding to the initial segmentation data that is successfully matched is obtained, thereby determining whether there is an inaccurate split.
  • a new word segmentation logic is added to re-split the message data, further verifying the splitting of the message data, avoiding inaccuracies in analyzing the message data, and enhancing applicability.
  • the step of querying, from the target segmentation data, the target word segmentation data whose risk level is greater than the preset level in the identification thread may include: matching the target word segmentation data with the pre-stored data with a risk level greater than a preset level.
  • the target word segmentation data When at least one target word segment data is successfully matched, obtaining a storage category of the data whose matching risk level is greater than the preset level; obtaining time data in the source data corresponding to the successfully matched target word segment data, and extracting the time data according to the time data Setting the target data segmentation data in the time period of the message data acquisition period; matching the target word segmentation data not matched with the data stored in the storage category with a risk level greater than the preset level; when the target word segmentation data is not matched If the data with the risk level pre-stored under the storage category is greater than the preset level, the target word segmentation data that is not matched is the target word segmentation data whose risk level is greater than the preset level.
  • the storage category refers to a preset storage category, and the corresponding storage category stores corresponding data.
  • the identification thread when the obtained message data is split into target segmentation data by using word segmentation logic, the target word segmentation data is matched one by one with the pre-stored data with a risk level greater than a preset level, when at least one word segmentation data and pre-stored data are used. If the data with the risk level greater than the preset level is successfully matched, the current storage class with the risk level greater than the preset level is obtained, and then the time data in the source data corresponding to the successfully matched target word segmentation data is obtained, and then the preset data is obtained.
  • the message data acquisition time period is obtained according to the time data corresponding to the successfully matched target word segment data and the message data acquisition time period, and extracts the time data corresponding to the number of successfully matched target word segments to the time data corresponding to the time data in the message data acquisition time period.
  • Perform matching target word segmentation data and then match the unmatched target word segmentation data in the message data acquisition time period with the data stored in the storage category with a risk level greater than a preset level.
  • Target segmentation data is greater than risk level
  • Target level set segmentation data when the goal is not to match the word with the data stored in the storage category level is greater than the risk of data at a preset level when no match is found, then match with other storage categories.
  • the target word segmentation data “Ping An Bank”, “Today” and “Activities” obtained by the terminal are respectively matched with data with a risk level greater than a preset level. If the “Ping An Bank” is successfully matched, then the “Ping An Bank” is obtained.
  • the storage category whose matching risk level is greater than the preset level is the bank category, and then the time data in the source data corresponding to the successfully matched “Ping An Bank” is obtained, for example, at 14:00 on January 1, 2018, and then extracted
  • the time data is from the unmatched target word segmentation data within a period of 5 minutes from the preset message data acquisition period, that is, the extraction from 14:00 on January 1, 2018 to 14:00 on January 1, 2018.
  • the preset message data acquisition period may also be 3 minutes, 7 minutes, 10 minutes, 20 minutes, and the like.
  • the preset message data acquisition period is within the preset time period.
  • the different message data in the preset time period may have the same data in the storage category corresponding to the target segmentation data of the matching successful target data, and the data in the storage category of the data of the preset level is first. Matching the unmatched word segmentation data within the preset time period with the pre-stored risk level of the matching successful storage category is greater than the preset level data, and the unmatched target word segmentation data and the storage category are pre-stored. If the data with a risk level greater than the preset level is successfully matched, the target word segmentation data that is not matched is the target word segmentation data whose risk level is greater than the preset level, thereby saving the query time, thereby improving the processing efficiency.
  • the method includes: acquiring a related phrase of the target segmentation phrase whose risk level is greater than the preset level; and when the associated phrase is If the risk level is greater than the preset level, the query is related to the target word segment data corresponding to the source data of the risk tag not added; if there is a related phrase in the target word segment data corresponding to the source data of the risk tag not added, then Add source tags with no risk tags added to the risk tag.
  • the related phrase refers to a phrase that is similar or identical to the target segmentation phrase whose risk level is greater than the preset level.
  • the corresponding related phrase can be “Ping An Financial Institution”.
  • a risk tag is added to the source data corresponding to the target segmentation data whose risk level is greater than the preset level
  • the associated phrase of the target segmentation phrase whose risk level is greater than the preset level is acquired, and the obtained associated phrase and preset are obtained.
  • the risk level is greater than the data of the preset level to be matched.
  • the related phrase is also the data whose risk level is greater than the preset level, and then the target word segment data corresponding to the source data of the risk tag is not added and the associated data. Matching is performed.
  • the matching is successful, that is, there is a related phrase in the target word segment data corresponding to the source data of the risk tag not added, and the undivided message data corresponding to the target word segment data corresponding to the source data of the unadded risk tag is suspicious.
  • the source data of the risk tag is not added. If the message data needs to be further monitored, the source data of the risk tag is not added to the risk tag.
  • the terminal acquires the associated phrase of the target word segmentation phrase whose risk level is greater than the preset level, for example, the associated phrase of the “Ping An Bank” is obtained.
  • “Ping An Financial Institution” matching the obtained “Ping Financial Institution” with the data with a preset risk level greater than the default year registration.
  • the related phrase “Ping Financial Institution” is also a risk.
  • the data of the level is greater than the preset level, and the target word segmentation data corresponding to the source data without the risk tag is matched with the associated data “Ping Financial Institute”.
  • the source data of the risk tag is not added. If there is a related phrase in the target word segment data, the undivided message data corresponding to the target word segment data corresponding to the source data of the risk tag is not suspicious message data, that is, the target word segment data is not added to the risk tag for the repurchase.
  • the source data is also needed to further monitor the message data, it will not be added Source data risk insurance label to label.
  • the target word segmentation data corresponding to the source data of the risk tag is not matched with the associated phrase, and the query is whether the target segmentation data whose risk level is greater than the preset level, so that the source data corresponding to the target segmentation data is also required.
  • the message data is further monitored to prevent the replacement of the phrase in the message data, so that the target segmentation data with the risk level greater than the preset level is missed, thereby improving the accuracy of the query of the message data.
  • the method further includes: extracting the identity identifier corresponding to the source data of the added risk tag; and counting the preset time period.
  • the same risk level corresponding to the internal identity is greater than the number of target word segmentation data of the preset level; when the number exceeds the preset value, the message data corresponding to the identity identifier is monitored.
  • the identity identifier refers to the identity information of the sending source corresponding to the message data; the identity identifier may be a text identifier, which may be a picture identifier, and may be a digital identifier; for example, the identity identifier may be an account of the sending user corresponding to the message data, It may be a user name of the sending user corresponding to the message data, and may be an avatar of the sending user corresponding to the message data.
  • the preset value of the target word segment data is set, and the identity identifier corresponding to the source data to which the risk tag is added is extracted, and then, the statistics corresponding to the identity tag have the same risk level greater than the preset level in the preset time period.
  • the number of target word segmentation data when the number of target word segment data exceeds a preset value, the user corresponding to the identity identifier may exchange corresponding high-risk information, thereby obtaining a high amount of interactive rewards with a low amount of payment, and thus identity
  • the corresponding message data is identified for further monitoring.
  • the preset value of the target word segment data is set to 10 times, and the identity identifier corresponding to the source data to which the risk tag is added is extracted, for example, the account of the sending user corresponding to the corresponding message data is extracted, and then the statistics are in the preset time period.
  • the message user corresponding to the message user corresponds to a higher risk level than the target word segmentation data of the preset level, such as the number corresponding to the target word segmentation data “Ping An Bank” whose risk level is greater than the preset level, when the “Ping An Bank” corresponds
  • the user may exchange information about the relevant Ping An Bank, thereby obtaining corresponding rewards, etc., and then the other information data sent by the sending user needs further monitoring, so that other message data can be obtained, and other message data can be queried. It is message data whose risk level is greater than the preset level.
  • the number of target word segmentation data corresponding to the identity level corresponding to the identity identifier in the preset time period is greater than the preset level, and when the quantity exceeds the preset value, the identity identifier is used.
  • the corresponding message data is monitored to obtain other suspicious message data, and the associated message data is queried according to the identity identifier corresponding to the target word segmentation data of the preset level, thereby improving processing efficiency and enhancing applicability.
  • the method further includes: obtaining a network address corresponding to the identity identifier according to the identity identifier; and querying the network address in the preset The number of identities registered during the registration period; when the number of registered identities exceeds the preset value, the network address is marked as a network address whose risk level is greater than the preset level.
  • the network address refers to the computer device in the network that can be uniquely identified.
  • the network address can be used as the communication identifier.
  • the network address can be an IP (Internet Protocol) address.
  • the preset value of the identity registration number is set, and when the message data corresponding to the identity identifier is monitored, the network address corresponding to the identity identifier is obtained from the network address repository according to the identity identifier, and then the network address is queried. The number of the identities registered in the preset registration period. When the number of registered identities exceeds the preset value, the network address is marked as a network address whose risk level is greater than the preset level, and the risk level is greater than the preset.
  • the terminal corresponding to the network address of the level may be a terminal whose risk level is greater than the preset level. If the terminal is a suspicious terminal, the network address is monitored to further avoid risks. It should be noted that the network address storage is stored correspondingly. The to-be-matched identity and the network address associated with the identity to be matched are matched with the identity to be matched in the network address repository. If the match is successful, the identity to be matched is obtained. Network address as the identity in the source data website address.
  • the preset value of the number of identity registrations is set, for example, the preset value is 100, and the network address corresponding to the identity identifier is obtained from the network address repository according to the identity identifier, and the network address is queried within a preset time period.
  • the number of other identities registered within 5 minutes for example, the number of other user accounts registered for query.
  • the network address is a suspicious network address, and the terminal is a suspicious terminal, and the terminal may be maliciously received.
  • the behavior of the reward, the network address is monitored.
  • the preset time period may be 3 minutes, may be 10 minutes, and the like.
  • the default account application number can be 200, which can be 500 or the like.
  • the network address is obtained according to the identity identifier, so as to query whether the network address is a suspicious network address.
  • the network address is marked as a network address whose risk level is greater than a preset level, according to
  • the message data can also be associated with a network address whose risk level is greater than a preset level, further avoiding risks, improving security, and enhancing applicability.
  • steps in the flowchart of FIG. 2 are sequentially displayed as indicated by the arrows, these steps are not necessarily performed in the order indicated by the arrows. Except as explicitly stated herein, the execution of these steps is not strictly limited, and the steps may be performed in other orders. Moreover, at least some of the steps in FIG. 2 may include a plurality of sub-steps or stages, which are not necessarily performed at the same time, but may be executed at different times, the execution of these sub-steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of the other steps.
  • a message data processing apparatus 300 including: a receiving module 310, a splitting module 320, an associating storage module 330, a querying module 340, and an obtaining module 350, where:
  • the receiving module 310 is configured to receive, in the main thread, message data sent by the server, where the message data carries the source data.
  • the splitting module 320 is configured to split the message data according to the word segmentation logic to obtain the target word segmentation data.
  • the association storage module 330 is configured to store the target segmentation data and the source data corresponding to the target segmentation data in association, and the source data corresponding to the target segmentation data is the same as the source data carried on the message data corresponding to the target segmentation data.
  • the query module 340 is configured to query, in the identification thread, whether the target word segmentation data with the risk level greater than the preset level exists in the target word segmentation data.
  • the obtaining module 350 is configured to obtain source data corresponding to the target word segmentation data whose ambiguous risk level is greater than a preset level, and add a risk tag to the obtained source data.
  • the splitting module 320 may include:
  • the first splitting unit is configured to obtain a preset plurality of word segmentation logic, and split the message data according to a preset plurality of word segmentation logic to obtain a word segmentation sequence.
  • the obtaining unit is configured to obtain a word segment sequence corresponding to the maximum split correctness rate as the target word segment data.
  • the splitting module 320 may further include:
  • the second splitting unit is configured to split the message data according to the word segmentation logic to obtain initial word segmentation data.
  • the first matching unit is configured to match the initial word segmentation data with the filter data in the base filter library.
  • the first extracting unit is configured to: when the initial word segmentation data and the filter data match successfully, extract time data in the source data corresponding to the initial segmentation word data that is successfully matched.
  • the message data obtaining unit is configured to acquire message data corresponding to the time data when the time data in the source data corresponding to the initial word segment data is the same.
  • An adding unit is configured to receive an add instruction for the word segmentation logic of the message data, and add a new word segmentation logic according to the add instruction.
  • the third splitting unit is configured to split the message data by using the new word segmentation logic to obtain the target word segmentation data.
  • the query module 340 can include:
  • the second matching unit is configured to match the target word segmentation data with the pre-stored data with a risk level greater than a preset level.
  • the storage category obtaining unit is configured to: when the at least one target word segmentation data is successfully matched, obtain a storage category of the data whose matching success risk level is greater than the preset level.
  • the second extracting unit is configured to obtain time data in the source data corresponding to the successfully matched target word segmentation data, and extract the unmatched target word segmentation data in the preset message data acquisition time period according to the time data.
  • a third matching unit configured to match the target segmentation data that is not matched with the data that is pre-stored under the storage category and whose risk level is greater than a preset level.
  • the target word segmentation data obtaining unit is configured to: when the target segmentation word data that is not matched is successfully matched with the data stored in the storage category with a risk level greater than a preset level, the target word segmentation data that is not matched is a risk level greater than a preset level. Target word segmentation data.
  • the message data processing apparatus 300 may include:
  • the associated phrase obtaining module is configured to obtain a related phrase of the target word segmentation data whose risk level is greater than the preset level.
  • the associated phrase querying module is configured to: when the risk level of the associated phrase is greater than the preset level, query whether there is a related phrase in the target word segmentation data corresponding to the source data of the risk tag not added.
  • the risk tag adding module is configured to add the source tag of the risk tag without adding the risk tag to the risk tag when there is a related phrase in the target word segment data corresponding to the source data of the risk tag not added.
  • the message data processing apparatus 300 may further include:
  • An identity extraction module is configured to extract an identity identifier corresponding to the source data of the added risk tag.
  • the quantity statistics module is configured to count the number of target word segmentation data corresponding to the same risk level corresponding to the identifier in the preset time period.
  • the monitoring module is configured to monitor the message data corresponding to the identity when the quantity exceeds the preset value.
  • the message data processing apparatus 300 may further include:
  • the network address obtaining module is configured to obtain a network address corresponding to the identity identifier according to the identity identifier.
  • the identity quantity query module is configured to query the number of the identity identifiers registered by the network address in the preset registration time period.
  • the marking module is configured to mark the network address as a network address whose risk level is greater than a preset level when the number of registered identity identifiers exceeds a preset value.
  • the various modules in the above message data processing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof.
  • Each of the above modules may be embedded in or independent of the processor in the computer device, or may be stored in a memory in the computer device in a software form, so that the processor invokes the operations corresponding to the above modules.
  • a computer device which may be a terminal, and its internal structure diagram may be as shown in FIG.
  • the computer device includes a processor, memory, network interface, display screen, and input device connected by a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the non-volatile storage medium stores operating systems and computer readable instructions.
  • the internal memory provides an environment for operation of an operating system and computer readable instructions in a non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection.
  • the computer readable instructions are executed by a processor to implement a message data processing method.
  • the display screen of the computer device may be a liquid crystal display or an electronic ink display screen
  • the input device of the computer device may be a touch layer covered on the display screen, or may be a button, a trackball or a touchpad provided on the computer device casing. It can also be an external keyboard, trackpad or mouse.
  • FIG. 4 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied.
  • the specific computer device may It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
  • a computer device comprising a memory and one or more processors having stored therein computer readable instructions, the computer readable instructions being executed by the processor such that the one or more processors perform the following steps: in the main thread
  • the message data sent by the server is received, and the message data carries the source data.
  • the message data is split according to the word segmentation logic to obtain the target word segmentation data.
  • the target segmentation data is associated with the source data corresponding to the target segmentation data, and the source data corresponding to the target segmentation data is the same as the source data carried on the target segmentation data corresponding to the target segmentation data.
  • the target word segmentation data is queried for whether there is target segmentation data whose risk level is greater than a preset level. Obtaining the source data corresponding to the target word segmentation data with the risk level greater than the preset level, and adding a risk tag to the obtained source data.
  • the step of splitting the message data according to the preset word segmentation logic to obtain the target word segmentation data may include: acquiring a preset plurality of word segmentation logic according to a preset Multiple word segmentation logic splits the message data to obtain a word segmentation sequence. Calculate the resolution of the split corresponding to each word segmentation sequence. The word segmentation sequence corresponding to the maximum splitting correct rate is obtained as the target word segmentation data.
  • the step of splitting the message data according to the word segmentation logic to obtain the target word segmentation data may further include: splitting the message data according to the word segmentation logic to obtain the initial word segmentation data.
  • the initial word segmentation data is matched to the filtered data in the base filter library.
  • the time data in the source data corresponding to the initial segmentation word data that is successfully matched is extracted.
  • the message data corresponding to the time data is acquired.
  • An add instruction that receives the word segmentation logic for the message data is added, and a new word segmentation logic is added according to the add instruction. The new word segmentation logic is used to split the message data to obtain the target word segmentation data.
  • the step of querying the target word segmentation data with the risk level greater than the preset level from the target segmentation data in the identifying thread may include: the target word segmentation data and the pre-existing risk Data with a level greater than the preset level is matched.
  • the storage category of the data whose matching success risk level is greater than the preset level is obtained.
  • the time data in the source data corresponding to the successfully matched target word segmentation data is obtained, and the unmatched target word segmentation data in the preset message data acquisition time period is extracted according to the time data.
  • the target word segmentation data that is not matched is matched with the data stored in the storage category with a risk level greater than the preset level.
  • the target word segmentation data that is not matched is successfully matched with the data stored in the storage category with the risk level greater than the preset level
  • the target word segmentation data that is not matched is the target word segmentation data whose risk level is greater than the preset level.
  • the processor may include: acquiring the target with the risk level greater than the preset level.
  • the associated phrase of the word segmentation data When the risk level of the associated phrase is greater than the preset level, whether the related phrase is present in the target word segmentation data corresponding to the source data of the risk tag not added is checked. When there is a related phrase in the target word segmentation data corresponding to the source data to which the risk tag is not added, the source tag data to which the risk tag is not added is added to the risk tag.
  • the processor further comprises: extracting the source data corresponding to the added risk tag, when the processor executes the computer readable instructions Identity.
  • the number of target word segmentation data corresponding to the identity level corresponding to the identity level in the preset preset time period is greater than the number of target word segment data in the preset level.
  • the message data corresponding to the identity identifier is monitored.
  • the step of monitoring the message data corresponding to the identity identifier further includes: obtaining the network corresponding to the identity identifier according to the identity identifier. address. Query the number of identities that the network address is registered within the preset registration period. When the number of registered identities exceeds a preset value, the network address is marked as a network address whose risk level is greater than a preset level.
  • One or more non-transitory computer readable storage mediums storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the following steps: in a main thread
  • the message data sent by the receiving server is carried in, and the message data carries the source data.
  • the message data is split according to the word segmentation logic to obtain the target word segmentation data.
  • the source data corresponding to the target word segmentation data and the target word segment data are stored in association, and the source data corresponding to the target segmentation data is the same as the source data carried on the message data corresponding to the target segmentation data.
  • the target word segmentation data is queried for whether there is target segmentation data whose risk level is greater than a preset level. Obtaining the source data corresponding to the target word segmentation data with the risk level greater than the preset level, and adding a risk tag to the obtained source data.
  • the step of splitting the message data according to the preset word segmentation logic to obtain the target word segmentation data may include: acquiring a preset plurality of word segmentation logic, according to the preset The multiple word segmentation logic splits the message data to obtain a word segmentation sequence. Calculate the resolution of the split corresponding to each word segmentation sequence. The word segmentation sequence corresponding to the maximum splitting correct rate is obtained as the target word segmentation data.
  • the step of splitting the message data according to the word segmentation logic to obtain the target word segmentation data may further include: splitting the message data according to the word segmentation logic to obtain the initial word segmentation data. .
  • the initial word segmentation data is matched to the filtered data in the base filter library.
  • the time data in the source data corresponding to the initial segmentation word data that is successfully matched is extracted.
  • the message data corresponding to the time data is acquired.
  • An add instruction that receives the word segmentation logic for the message data is added, and a new word segmentation logic is added according to the add instruction. The new word segmentation logic is used to split the message data to obtain the target word segmentation data.
  • the step of querying the target word segmentation data with the risk level greater than the preset level from the target word segmentation data in the recognition thread may include: the target word segmentation data and the pre-stored Data with a risk level greater than the preset level is matched.
  • the storage category of the data whose matching success risk level is greater than the preset level is obtained.
  • the time data in the source data corresponding to the successfully matched target word segmentation data is obtained, and the unmatched target word segmentation data in the preset message data acquisition time period is extracted according to the time data.
  • the target word segmentation data that is not matched is matched with the data stored in the storage category with a risk level greater than the preset level.
  • the target word segmentation data that is not matched is successfully matched with the data stored in the storage category with the risk level greater than the preset level
  • the target word segmentation data that is not matched is the target word segmentation data whose risk level is greater than the preset level.
  • the method may include: acquiring the risk level greater than the preset level.
  • the associated phrase of the target word segmentation data When the risk level of the associated phrase is greater than the preset level, whether the related phrase is present in the target word segmentation data corresponding to the source data of the risk tag not added is checked. When there is a related phrase in the target word segmentation data corresponding to the source data to which the risk tag is not added, the source tag data to which the risk tag is not added is added to the risk tag.
  • the method further includes: extracting the source data corresponding to the added risk tag Identity.
  • the number of target word segmentation data corresponding to the identity level corresponding to the identity level in the preset preset time period is greater than the number of target word segment data in the preset level.
  • the step of monitoring the message data corresponding to the identity identifier further includes: obtaining, according to the identity identifier, the identity identifier website address. Query the number of identities that the network address is registered within the preset registration period. When the number of registered identities exceeds a preset value, the network address is marked as a network address whose risk level is greater than a preset level.
  • Non-volatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • Synchlink DRAM SLDRAM
  • Memory Bus Radbus
  • RDRAM Direct RAM
  • DRAM Direct Memory Bus Dynamic RAM
  • RDRAM Memory Bus Dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Transfer Between Computers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种消息数据处理方法,包括:在主线程中接收服务器发送的消息数据,消息数据上携带有来源数据。将消息数据按照分词逻辑进行拆分得到目标分词数据。将目标分词数据与目标分词数据对应的来源数据进行关联存储,目标分词数据对应的来源数据与目标分词数据对应消息数据上携带的来源数据相同。在识别线程中查询目标分词数据中是否存在风险等级大于预设等级的目标分词数据。获取查询到的风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的来源数据添加风险标签。

Description

消息数据处理方法、装置、计算机设备和存储介质
相关申请的交叉引用
本申请要求于2018年2月7日提交中国专利局,申请号为2018101245470,申请名称为“消息数据处理方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种消息数据处理方法、装置、计算机设备和存储介质。
背景技术
随着互联网科技的发展,在生活中,用户可以从网络上获取到越来越多的信息,而一些高危信息是由群聊等的方式进行传播的,例如对某网站举行的活动,一些不法分子会以群聊等的方式进行交流,从而以低额的付出获取高额的活动奖励等,因此,对群聊的消息监控是非常重要的。
传统地,需要人工实时监控聊天软件中的群聊等消息,进而人工分析接收到的群聊消息中是否存在有风险等级大于预设等级的聊天消息,当有大量群聊消息时,则人工分析效率低且可能不准确。
发明内容
根据本申请公开的各种实施例,提供一种消息数据处理方法、装置、计算机设备和存储介质。
一种消息数据处理方法,包括:
在主线程中接收服务器发送的消息数据,所述消息数据上携带有来源数据;
将所述消息数据按照分词逻辑进行拆分得到目标分词数据;
将所述目标分词数据与所述目标分词数据对应的来源数据进行关联存储,所述目标分词数据对应的来源数据与所述目标分词数据对应消息数据上携带的来源数据相同;
在识别线程中查询所述目标分词数据中是否存在风险等级大于预设等级的目标分词数据;及
获取查询到的所述风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的所述来源数据添加风险标签。
一种消息数据处理装置,包括:
接收模块,用于在主线程中接收服务器发送的消息数据,所述消息数据上携带有来源数据;
拆分模块,用于将所述消息数据按照分词逻辑进行拆分得到目标分词数据;
关联存储模块,用于将所述目标分词数据与所述目标分词数据对应的来源数据进行关联存储,所述目标分词数据对应的来源数据与所述目标分词数据对应消息数据上携带的来源数据相同;
查询模块,用于在识别线程中查询所述目标分词数据中是否存在风险等级大于预设等级的目标分词数据;及
获取模块,用于获取查询到的所述风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的所述来源数据添加风险标签。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:
在主线程中接收服务器发送的消息数据,所述消息数据上携带有来源数据;
将所述消息数据按照分词逻辑进行拆分得到目标分词数据;
将所述目标分词数据与所述目标分词数据对应的来源数据进行关联存储,所述目标分词数据对应的来源数据与所述目标分词数据对应消息数据上携带的来源数据相同;
在识别线程中查询所述目标分词数据中是否存在风险等级大于预设等级的目标分词数据;及
获取查询到的所述风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的所述来源数据添加风险标签。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
在主线程中接收服务器发送的消息数据,所述消息数据上携带有来源数据;
将所述消息数据按照分词逻辑进行拆分得到目标分词数据;
将所述目标分词数据与所述目标分词数据对应的来源数据进行关联存储,所述目标分词数据对应的来源数据与所述目标分词数据对应消息数据上携带的来源数据相同;
在识别线程中查询所述目标分词数据中是否存在风险等级大于预设等级的目标分词数据;及
获取查询到的所述风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的所述来源数据添加风险标签。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中消息数据处理方法的应用场景图。
图2为根据一个或多个实施例中消息数据处理方法的流程示意图。
图3为根据一个或多个实施例中消息数据处理装置的框图。
图4为根据一个或多个实施例中计算机设备的框图。
具体实施方式
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的消息数据处理方法,可以应用于如图1所示的应用环境中。终端102通过网络与服务器104通过网络进行通信。终端102在主线程中接收服务器104发送的消息数据,该消息数据上携带有来源数据,进而终端102将接收到的消息数据按照分词逻辑进行拆分得到目标分词数据,终端102将拆分得到的目标分词数据与目标分词数据对应的来源数据进行关联存储,进而终端102启动识别线程,在识别线程中查询目标分词数据中是否存在风险等级大于预设等级的目标分词数据,进而当查询到存在有风险等级大于预设等级的目标分词数据时,则获取该目标分词数据对应的来源数据,并将该目标分词数据对应的来源数据添加风险标签。,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑,服务器104可以用独立的服务器104或者是多个服务器104组成的服务器104集群来实现。
在其中一个实施例中,如图2所示,提供了一种消息数据处理方法,以该方法应用于图1中的终端为例进行说明,包括以下步骤:
S202:在主线程中接收服务器发送的消息数据,消息数据上携带有来源数据。
具体地,主线程是指在终端工作时,创建的可以执行各种任务的线程,在主线程中终端可以执行等待任务,在主线程终端可以执行接收任务,在主线程中终端可以执行处理消息数据的任务,例如,在主线程中终端可以执行接收消息数据的任务,在主线程中终端可以执行对消息数据拆分的任务。消息数据是指服务器中存储的不同身份信息交互的聊天数据,消息数据可以是文字数据,可以是图片数据,也可以是数字数据等,例如,消息数据可以存储在服务器上的不同账号之间的聊天记录等。来源数据是指对应的消息数据来源的标识,来源数据可以是文字数据,可以是文字数据,可以是图片数据等,例如来源数据为消息数据的发送者的账号信息,发送消息数据的发送时间,消息数据的来源群组的基本信息等。进一步地,启动当前主线程,在主线程中接收到服务器发送的消息数据,该消息数据上携带有消息数据对应的来源。可以是,向服务器对应的聊天发送接口发送获取聊天消息的获取请求,接收服务器发送的身份验证请求,进而根据接收到服务器发送的身份验证请求向服务器发送身份验证信息,当身份验证信息通过服务器验证成功时,则可以服务器进行数据传输,从而接收服务器发送的消息数据,该消息数据可以是该身份验证信息对应 的聊天数据,接收到的消息数据上携带有来源数据。例如,终端向服务器对应的聊天发送接口发送获取聊天信息的获取请求,终端接收服务器根据聊天消息的获取请求发送的身份验证请求,进而终端向服务器发送与身份验证请求对应的身份验证信息,如对应的用户名和登录密码,当用户名和密码经过服务器验证成功时,则终端可以与服务器进行数据传输,进而接收服务器发送的聊天数据,该聊天数据上可以携带有发送每一条聊天数据的发送者的账号、发送者发送的时间,当是群组聊天时,则携带有群组对应的群名称或群号码等群组基本信息。
S204:将消息数据按照分词逻辑进行拆分得到目标分词数据。
具体地,分词逻辑是将消息数据拆分成数个标准术语,该标准术语是指具有独立语义的术语,其不受其之后或之前的文字的影响,仅通过术语内容的文字即可以确定完整的计算机专业概念,例如对消息数据“平安银行活动”按照使每个拆分数据具有独立的语义,且每个拆分数据为最短的进行拆分,得到“平安银行”和“活动”两个个拆分数据。目标分词数据是指经过拆分后得到的具有独立语义的术语。进一步地,当接收到服务器发送的消息数据时,则获取相应的分词逻辑,进而,采用分词逻辑将消息数据进行拆分,得到目标分词数据。具体地,当消息数据为文字数据时,当终端接收到服务器发送的消息数据时,则获取相应的分词逻辑,进而将消息数据中的字符逐一与分词逻辑进行匹配,匹配成功的字符作为目标分词数据。例如,当终端接收到服务器发送的消息数据为“平安银行今天活动”,进而获取到分词逻辑,将消息数据拆分成“平安银行”、“今天”、“活动”三个目标分词词组数据。
S206:将目标分词数据与目标分词数据对应的来源数据进行关联存储,目标分词数据对应的来源数据与目标分词数据对应消息数据上携带的来源数据相同。
具体地,将拆分得到的目标分词数据进行存储,同时,将目标分词数据对应的来源数据同时与目标分词数据进行存储,目标分词数据对应的来源数据为拆分得到目标分词数据的消息数据上携带的来源数据。可以是,终端将目标分词数据与目标分词数据对应的来源数据同时存储至数据库中,目标分词数据对应的来源数据可以是消息数据携带的发送者的账号信息,发送消息数据的发送时间,消息数据的来源群组的基本信息等。例如,终端将得到的消息数据“平安银行今天活动”拆分得到目标分词数据为“平安银行”、“今天”和“活动”,进而将三个目标分词数据都存储至数据库中,且目标分词数据对应的来源数据为该条聊天数据对应的每一条聊天数据的发送者的账号、发送者发送的时间,当时群组聊天是,则携带有群组对应的群名称或群号码等群组基本信息。
S208:在识别线程中查询目标分词数据中是否存在风险等级大于预设等级的目标分词数据。
具体地,识别线程是指终端在工作时,创建的可以执行识别任务的另一个线程,识别线程与主线程为异步关系,也即识别线程是指与主线程异步的工作线程,例如,在识别线程中可以执行识别目标分词数据中是否存在风险等级大于预设等级的目标分词数据,在主 线程中可以继续接受服务器发送的消息数据,从而将消息数据按照分词逻辑进行拆分得到目标分词数据。风险等级大于预设等级的目标分词数据是指存风险等级高的目标分词数据,当有风险等级高的目标分词数据时,则该目标分词数据为可疑目标分词数据,进而拆分得到可疑目标分词数据的消息数据为可疑消息数据。具体地,当在主线程中将得到的目标分词数据与目标分词数据对应的来源数据进行关联存储时,则在创建的识别线程中查询未存在有识别任务时,则在识别线程中,终端预设有风险等级大于预设等级的数据,对已存储的目标分词数据与预设的风险等级大于预设等级的数据进行匹配,从而查询目标分词数据中是否存在风险等级大于预设等级的目标分词数据,当目标分词数据存在有风险等级大于预设等级的目标分词数据时,则该目标分词数据为可疑目标分词数据,进而该消息数据为可疑消息数据,该可疑消息数据对应的来源数据为具有高风险的来源数据。例如,设置有风险等级大于预设等级的目标分词数据为“平安银行”,终端在主线程中将得到的目标分词数据“平安银行”、“今天”和“活动”,进而将目标分词数据对应的来源数据进行关联存储,终端在创建的识别线程中查询未存在有识别任务时,则在识别线程中,对已存储的“平安银行”、“今天”和“活动”进行查询是否存在风险等级大于预设等级的目标分词数据,当查询到“平安银行”时,则得到的三个目标分词数据存在有风险等级大于预设等级的目标分词数据为“平安银行”,进而得到该目标分词数据的消息数据为消息数据。
S210:获取查询到的风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的来源数据添加风险标签。
具体地,风险标签是指对应的风险提示信息,可以是,风险标签是相应的风险标识,例如,当对来源数据添加了风险标签,则该来源数据是具有高风险等级的来源数据,且需要进一步监控添加了风险标签的来源数据,也可以进一步监控来源数据对应的消息数据。进一步地,获取查询到的风险等级大于预设等级的目标分词数据对应的来源数据,并对获取到的来源数据添加有风险标签。例如,终端获取查询到拆分得到的目标分词数据“平安银行”、“今天”和“活动”中有风险等级大于预设等级的目标分词数据“平安银行”,进而获取到“平安银行”对应的来源数据,如为群聊时,则获取群名称或群号码等群组基本信息,进而将群组基本信息添加上风险标签,则需进一步监控添加了风险标签的群组,从而进一步监控该群组中对应的其他消息数据。
本实施例中,可在不同的线程中执行不同的任务,在主线程中对服务器发送的消息数据进行接收,进而将消息数据按照分词逻辑进行拆分得到目标分词数据,并将得到的目标分词数据与对应的来源数据进行存储,在识别线程中从目标分词数据中查询风险等级大于预设等级的目标分词数据,无需人工进行监控分析,即可在不同线程中处理大量的消息数据,提高处理效率,进而将风险等级大于预设等级高的目标分词数据对应的来源数据添加风险标签,可以进一步监控,提高准确率。
在其中一个实施例中,将消息数据按照预设分词逻辑进行拆分得到目标分词数据的步骤,可以包括:获取预设的多个分词逻辑,根据预设的多个分词逻辑对消息数据进行拆分 得到分词序列;计算每一个分词序列对应的拆分正确率;获取与最大的拆分正确率对应的分词序列作为目标分词数据。
具体地,分词序列是指对消息数据进行拆分得到的不同的候选字段,也即得到不同的标准术语,例如,获取到的消息数据为“平安银行今天活动”,则可以得到不同的分词序列有第一分词序列“平安”“银行”“今天”“活动”,得到的第二分词序列为“平安银行”“今天”“活动”。拆分正确率是指经过将消息数据进行拆分得到的候选字段的正确程度,也即将消息数据进行拆分得到的不同的标准术语的正确程度,当拆分正确率越高则表示将消息数据进行拆分得到的候选字段正确率越高。
进一步地,获取多个预存储的分词逻辑,进而根据预设的分词逻辑,将获取到的消息数据逐条进行拆分,每条消息数据都得到对应的不同的分词序列,进而计算每条消息的每一个分词序列对应的拆分正确率,将拆分正确率最大的分词序列中的分词词组作为目标分词数据。其中,计算拆分正确率可以是当获取到不同的分词序列时,则获取预设的每个分词序列中的不同的分词词组的正确率,进而计算不同的分词词组的正确率乘积,从而得出每一个分词序列对应的拆分正确率,
例如,终端接收到服务器发送的消息数据为“平安银行今天活动”,进而获取不同的分词逻辑,根据不同的分词逻辑,将获取到的消息数据拆分为不同的分词序列,可以拆分为第一分词序列“平安”“银行”“今天”“活动”,拆分的第二分词序列为“平安银行”“今天”“活动”,进而获取第一分词序列中的不同的分词词组的正确率,如获取到“平安”的正确率为0.8,“银行”的正确率为“0.8”,“今天”的正确率为1,“活动”的正确率为1,获取第二分词序列中不同的分词词组的正确率,如获取“平安银行”的正确率为1,“今天”的正确率为1,“活动”的正确率为1,进而计算得到第一序列中的分词词组的正确率的乘积为0.64,则第一分词序列对应的拆分正确率为0.64,进而计算得到第二序列中的分词词组的正确率的乘积为1,则第二分词序列对应的拆分正确率为1,则第二分词序列的拆分正确率为最大的拆分正确率,进而将第二分词序列中的分词词组“平安银行”、“今天”和“活动”作为目标分词数据。需要说明的是,不同的分词词组的概率为预设的概率,该预设概率根据可以预设为不同的值。
本实施例中,终端可以将消息数据根据多个分词逻辑拆分为多个分词序列,进而选取拆分正确率最大的分词序列作为目标分词数据,保证可以得到准确的目标分词数据,进而可以提高识别的准确率。
在其中一个实施例中,将消息数据按照分词逻辑进行拆分得到目标分词数据的步骤,可以包括:将消息数据按照分词逻辑进行拆分得到初始分词数据;
将初始分词数据与基础滤镜库中的过滤数据进行匹配;当初始分词数据与过滤数据匹配成功时,则提取匹配成功的初始分词数据对应的来源数据中的时间数据;当初始分词数据对应的来源数据中的时间数据相同时,则获取时间数据对应的消息数据;接收针对消息数据的分词逻辑的添加指令,并根据添加指令添加新的分词逻辑;采用新的分词逻辑将消息数 据进行拆分得到目标分词数据。
具体地,初始分词数据是指按照分词逻辑进行拆分得到的待验证的具有独立语义的术语,经过验证即拆分正确得到的初始分词数据即作为最终目标分词数据;基础滤镜库是指存储有简单字符的数据库,基础数据库中可以存储有单个单词或词组,例如存储有“平安”、“银行”、“星期”等,且基础滤镜率存储的简单字符为拆分出的可能不正确的简单词组,进而需要进一步识别验证得到的初始分词数据的正确性;过滤数据是指存储在基础滤镜库中的简单字符,如一些单个的单词或者词组;时间数据是指来源数据中存储的接收到消息数据的接收时间,例如,时间数据可以是具体的年月日及时刻,如时间数据为2018年1月1日13点30分等。
具体地,将消息数据按照分词逻辑进行拆分得到初始分词数据,进而将初始分词数据与基础滤镜库中存储的过滤数据逐一进行匹配,当初始分词数据中至少两个数据与基础滤镜库中存储的过滤数据匹配成功时,则该初始分词数据可能为消息数据对应的不正确的拆分结果,则需要对该拆分结果进行进一步确认,则提取匹配成功的初始分词数据对应的来源数据中的时间数据,进而当匹配成功的初始分词数据对应的时间数据相同时,则匹配成功的初始分词数据来源于同一条消息数据,进而该条消息数据采用当前预存储的分词逻辑进行拆分得到的初始分词数据不正确,无法作为目标分词数据,进而获取时间数据对应的消息数据,根据获取到的消息数据,在显示界面上显示相应的提示信息,进而用户根据提示信息进行选择,当选择添加新的分词逻辑时,则根据该选择生成对应的添加指令,当接收到该添加指令时,则根据该添加指令添加消息数据对应的分词逻辑,进而采用新的分词逻辑对初始分词数据对应的消息数据重新拆分,得到目标分词数据。
例如,终端将消息数据“平安银行今天活动”按照分词逻辑进行拆分得到的初始分词数据分别为“平安”“银行”“今天”“活动”,进而终端与基础滤镜库中存储的过滤数据逐一进行匹配,当初始分词数据“平安”与初始分词数据“银行”与基础滤镜库中存储的过滤数据匹配成功时,则该条消息数据可能出现拆分不正确的结果,即需要将该消息数据拆分为高级词组,进而终端提取每个匹配成功的初始分词数据对应的来源数据中的时间数据,该来源数据也即为消息数据对应的来源数据,如提取到的匹配成功的初始分词数据对应的时间数据为2018年1月1日14点整,则匹配成功的初始分词数据来源于同一条消息数据,进而该条消息数据采用的当前预存储的分词逻辑拆分得到的“平安”“银行”“今天”“活动”的初始分词数据不正确,也即该消息数据需要被进行重新拆分得到相应的高级词组作为目标分词数据,进而终端获取该2018年1月14日对应的消息数据“平安银行今天活动”,根据将获取到的消息数据,在终端的显示界面上显示相应的提示信息,例如提示信息可以是“是否添加新的分词逻辑”,进而用户选择“是”,则将生成对应的添加指令,终端接收到该添加指令时,则根据该添加指令将该消息数据对应的分词逻辑进行添加,从而采用新添加的分词逻辑对该“平安银行今天活动”进行拆分,得到目标分词数据,新的分词逻辑拆分得到的目标分词数据可以为“平安银行”“今天”“活动”。
本实施例中,对初始分词数据与基础滤镜库中的存储的数据进行匹配,进而获取匹配成功的初始分词数据对应的来源数据中的时间数据,从而判断是否存在有拆分不准确的情况,当出现拆分不准确时,则添加新的分词逻辑,对消息数据重新拆分,对消息数据的拆分进行进一步验证,避免分析消息数据时出现不准确,增强适用性。
在其中一个实施例中,在识别线程中从目标分词数据中查询风险等级大于预设等级的目标分词数据的步骤,可以包括:将目标分词数据与预存的风险等级大于预设等级的数据进行匹配;当至少一个目标分词数据匹配成功时,则获取匹配成功的风险等级大于预设等级的数据的存储类别;获取匹配成功的目标分词数据对应的来源数据中的时间数据,根据时间数据提取在预设的消息数据获取时间段内的未进行匹配的目标分词数据;将未进行匹配的目标分词数据与存储类别下预存的风险等级大于预设等级的数据进行匹配;当未进行匹配的目标分词数据与存储类别下预存的风险等级大于预设等级的数据匹配成功时,则未进行匹配的目标分词数据为风险等级大于预设等级的目标分词数据。
具体地,存储类别是指预设的存储种类,对应的存储类别存储有相应的数据。在识别线程中,将获取到的消息数据采用分词逻辑拆分为目标分词数据时,则将目标分词数据逐一与预存的风险等级大于预设等级的数据进行匹配,当至少一个分词数据与预存的风险等级大于预设等级的数据匹配成功时,则获取当前的风险等级大于预设等级的数据的存储类别,进而获取匹配成功的目标分词数据对应的来源数据中的时间数据,进而获取预设的消息数据获取时间段,根据匹配成功的目标分词数据对应的时间数据与消息数据获取时间段,提取从匹配成功的目标分词数对应的时间数据至与消息数据获取时间段内的时间数据对应的未进行匹配的目标分词数据,进而将在消息数据获取时间段内的未进行匹配的目标分词数据与存储类别下存储的风险等级大于预设等级的数据进行匹配,当匹配成功时,则未进行匹配的目标分词数据为风险等级大于预设等级的目标分词数据,当未进行匹配的的目标分词数据与存储类别下预存的风险等级大于预设等级的数据未匹配成功时,则再与其他存储类别进行匹配。
例如,终端将得到的目标分词数据“平安银行”、“今天”“有活动”分别与风险等级大于预设等级的数据进行匹配,如“平安银行”匹配成功时,则获取与“平安银行”匹配成功的风险等级大于预设等级的数据的存储类别为银行类别,进而获取匹配成功的“平安银行”对应的来源数据中的时间数据,如为2018年1月1日14时,进而提取在该时间数据起预设的消息数据获取时间段为5分中内的未进行匹配的目标分词数据,也即提取2018年1月1日14时至2018年1月1日14时05分的未进行匹配的目标分词数据,进而将未进行匹配的目标分词数据与银行类别下预存的风险等级大于预设等级的数据进行匹配,当匹配成功时,则该未进行匹配的目标分词数据为风险等级大于预设等级的目标分词数据,当未匹配成功时,再与其他非银行存储类别的其他存储类别下预存的风险等级大于预设等级的数据进行匹配。需要说明的是,预设的消息数据获取时间段还可以是3分钟、7分钟、10分钟、20分钟等。
本实施例中,当终端将目标分次数据与预存的风险等级大于预设等级的数据进行匹配时,当至少有一个目标分次数据匹配成功时,则在预设的消息数据获取时间段内都可能进行相同的话题的讨论,则在预设时间段内的不同的消息数据可能存在与匹配成功的目标分词数据对应的风险等级大于预设等级的数据的存储类别中相同的数据,则先将在预设时间段内的未进行匹配的分词数据优先与该匹配成功的存储类别下预存的风险等级大于预设等级的数据进行匹配,当未进行匹配的目标分词数据与存储类别下预存的风险等级大于预设等级的数据匹配成功时,则未进行匹配的目标分词数据为风险等级大于预设等级的目标分词数据,节省查询时间,从而提高处理效率。
在其中一个实施例中,将风险等级大于预设等级的目标分词数据对应的来源数据添加风险标签的步骤之后,包括:获取风险等级大于预设等级的目标分词词组的关联词组;当关联词组的风险等级大于预设等级时,则查询与未添加风险标签的来源数据对应的目标分词数据中是否存在关联词组;当与未添加风险标签的来源数据对应的目标分词数据中存在关联词组时,则将未添加风险标签的来源数据添加风险标签。
具体地,关联词组是指与风险等级大于预设等级的目标分词词组的相近或相同语义的词组,例如,当“平安银行”为风险等级大于预设等级的目标分词词组时,则“平安银行”对应的关联词组可以为“平安金融机构”。具体地,当对风险等级大于预设等级的目标分词数据对应的来源数据添加了风险标签时,获取风险等级大于预设等级的目标分词词组的关联词组,进而将获取到的关联词组与预设的风险等级大于预设等级的数据进行匹配,当匹配成功时,则该关联词组也为风险等级大于预设等级的数据,进而将未添加风险标签的来源数据对应的目标分词数据与该关联数据进行匹配,当匹配成功时,也即未添加风险标签的来源数据对应的目标分词数据中存在关联词组,则未添加风险标签的来源数据对应的目标分词数据对应的未拆分的消息数据为可疑消息数据,则该未添加风险标签的来源数据也为需要进一步监控消息数据,则将未添加风险标签的来源数据添加风险标签。
例如,当风险等级大于预设等级的目标分词数据对应的来源数据添加了风险标签时,则终端获取风险等级大于预设等级的目标分词词组的关联词组,如获取“平安银行”的关联词组为“平安金融机构”,进而,将获取到的“平安金融机构”与预设的风险等级大于预设年登记的数据进行匹配,当匹配成功时,则该关联词组“平安金融机构”也为风险等级大于预设等级的数据,进而将未添加有风险标签的来源数据对应的目标分词数据与该关联数据“平安金融机构”进行匹配,当匹配成功时,也即未添加风险标签的来源数据对应的目标分词数据中存在关联词组,则未添加风险标签的来源数据对应的目标分词数据对应的未拆分的消息数据为可疑消息数据,也即该目标分词数据对回购则该未添加风险标签的来源数据也为需要进一步监控消息数据,则将未添加风险标签的来源数据添加风险标签。
本实施例中,将未添加风险标签的来源数据对应的目标分词数据与关联词组进行匹配,查询是否为风险等级大于预设等级的目标分词数据,从而与目标分词数据对应的来源数据是否也需要进一步监控消息数据,防止当消息数据中出现更换的词组从而查询风险等 级大于预设等级的目标分词数据有遗漏,进而提高对消息数据查询的准确率。
在其中一个实施例中,将风险等级大于预设等级的目标分词数据对应的来源数据添加风险标签的步骤之后,还可以包括:提取添加风险标签的来源数据对应的身份标识;统计预设时间段内身份标识对应的相同的风险等级大于预设等级的目标分词数据的数量;当数量超过预设值时,则对身份标识对应的消息数据进行监控。
具体地,身份标识是指消息数据对应的发送来源的身份信息;身份标识可以是文字标识,可以是图片标识,可以是数字标识等;例如,身份标识可以是消息数据对应的发送用户的账号、可以是消息数据对应的发送用户的用户名,可以是消息数据对应的发送用户的头像等。具体地,设置有目标分词数据出现的预设值,提取添加了风险标签的来源数据对应的身份标识,进而,统计在预设时间段内该身份标识对应的具有相同的风险等级大于预设等级的目标分词数据的数量,当目标分词数据的数量超过预设值时,则该身份标识对应的用户可能交流相应的高危信息,从而以低额的付出获取高额的互动奖励等,进而对身份标识对应的消息数据进行进一步监控。
例如,设置有目标分词数据出现的预设值为10次,提取添加了风险标签的来源数据对应的身份标识,如提取相应消息数据对应的发送用户的账号,进而统计在预设时间段内如五分钟内该消息数据对应的发送用户对应的风险高等级大于预设等级的目标分词数据,如风险等级大于预设等级的目标分词数据“平安银行”对应的数量,当“平安银行”对应的数量超过10次时,则该用户可能交流相关平安银行的信息,从而获取相应的奖励等,进而该发送用户发送的其他信息数据需要进一步监控,从而可以获取其他的消息数据,查询其他消息数据是否为风险等级大于预设等级的消息数据。
本实施例中,通过获取相应的身份标识,查询预设时间段内的身份标识对应的相同的风险等级大于预设等级的目标分词数据的数量,当数量超过预设值时,则对身份标识对应的消息数据进行监控,从而获取到其他的可疑消息数据,根据风险等级大于预设等级的目标分词数据对应的身份标识查询关联的消息数据,提高处理效率,增强适用性。
在其中一个实施种,当数量超过预设值时,则对身份标识对应的消息数据进行监控的步骤之后,还可以包括:根据身份标识获取身份标识对应的网络地址;查询网络地址在预设的注册时间段内注册的身份标识的数量;当所注册的身份标识的数量超过预设值时,则将网络地址标记为风险等级大于预设等级的网络地址。
具体地,网络地址是指可以唯一地标识网络中的该计算机设备,该计算机与其他计算机进行通信时可以采用网络地址作为通信标识,例如,网络地址可以是IP(Internet Protocol,互联网协议)地址等。具体地,设置有身份标识注册数量的预设值,当对身份标识对应的消息数据进行监控,根据身份标识从网络地址存储库中获取到与身份标识对应的网络地址,进而查询该网络地址在预设的注册时间段内注册的身份标识的数量,当注册的身份标识的数量超过预设值时,则将网络地址标记为风险等级大于预设等级的网络地址,进而该风险等级大于预设等级的网络地址对应的终端可能为风险等级大于预设等级的 终端,进而该终端为可疑终端,则对该网络地址进行监控,进一步避免风险,需要说明的是,网络地址存储库是存储有相应的待匹配身份标识以及与待匹配身份标识关联的网络地址,将来源数据中的身份标识与网络地址存储库中存储的待匹配身份标识进行匹配,当匹配成功时,将获取待匹配身份标识对应的网络地址作为来源数据中的身份标识对应的网络地址。
例如,设置有身份注册数量的预设值,例如预设值为100,根据身份标识从网络地址存储库中获取到与身份标识对应的网络地址,查询该网络地址在预设的时间段内如5分钟内注册的其他身份标识的数量,例如查询注册的其他用户账号的数量,当超过100时,则该网络地址为可疑网络地址,进而该终端为可疑终端,且该终端可能会存在恶意领取奖励的等行为,则对该网络地址进行监控。需要说明的是,预设时间段可以是3分钟,可以是10分钟等。预设的账号申请数量可以是200,可以是500等。
本实施例中,根据身份标识获取到相应的网络地址,从而查询网络地址是否为可疑网络地址,当为可疑网络地址时,则将该网络地址标记为风险等级大于预设等级的网络地址,根据消息数据,还可以关联查询到风险等级大于预设等级的网络地址,进一步避免风险,提高安全性,增强适用性。
应该理解的是,虽然图2的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图3所示,提供了一种消息数据处理装置300,包括:接收模块310、拆分模块320、关联存储模块330、查询模块340和获取模块350,其中:
接收模块310,用于在主线程中接收服务器发送的消息数据,消息数据上携带有来源数据。
拆分模块320,用于将消息数据按照分词逻辑进行拆分得到目标分词数据。
关联存储模块330,用于将目标分词数据与目标分词数据对应的来源数据进行关联存储,目标分词数据对应的来源数据与目标分词数据对应消息数据上携带的来源数据相同。
查询模块340,用于在识别线程中查询目标分词数据中是否存在风险等级大于预设等级的目标分词数据。
获取模块350,用于获取查询到的风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的来源数据添加风险标签。
在其中一个实施例中,拆分模块320,可以包括:
第一拆分单元,用于获取预设的多个分词逻辑,根据预设的多个分词逻辑对消息数据 进行拆分得到分词序列。
计算单元,用于计算每一个分词序列对应的拆分正确率。
获取单元,用于获取与最大的拆分正确率对应的分词序列作为目标分词数据。
在其中一个实施例中,拆分模块320,还可以包括:
第二拆分单元,用于将消息数据按照分词逻辑进行拆分得到初始分词数据。
第一匹配单元,用于将初始分词数据与基础滤镜库中的过滤数据进行匹配。
第一提取单元,用于当初始分词数据与过滤数据匹配成功时,则提取匹配成功的初始分词数据对应的来源数据中的时间数据。
消息数据获取单元,用于当初始分词数据对应的来源数据中的时间数据相同时,则获取时间数据对应的消息数据。
添加单元,用于接收针对消息数据的分词逻辑的添加指令,并根据添加指令添加新的分词逻辑。
第三拆分单元,用于采用新的分词逻辑将消息数据进行拆分得到目标分词数据。
在其中一个实施例中,查询模块340,可以包括:
第二匹配单元,用于将目标分词数据与预存的风险等级大于预设等级的数据进行匹配。
存储类别获取单元,用于当至少一个目标分词数据匹配成功时,则获取匹配成功的风险等级大于预设等级的数据的存储类别。
第二提取单元,用于获取匹配成功的目标分词数据对应的来源数据中的时间数据,根据时间数据提取在预设的消息数据获取时间段内的未进行匹配的目标分词数据。
第三匹配单元,用于将未进行匹配的目标分词数据与存储类别下预存的风险等级大于预设等级的数据进行匹配。
目标分词数据获取单元,用于当未进行匹配的目标分词数据与存储类别下预存的风险等级大于预设等级的数据匹配成功时,则未进行匹配的目标分词数据为风险等级大于预设等级的目标分词数据。
在其中一个实施例中,消息数据处理装置300,可以包括:
关联词组获取模块,用于获取风险等级大于预设等级的目标分词数据的关联词组。
关联词组查询模块,用于当关联词组的风险等级大于预设等级时,则查询与未添加风险标签的来源数据对应的目标分词数据中是否存在关联词组。
风险标签添加模块,用于当与未添加风险标签的来源数据对应的目标分词数据中存在关联词组时,则将未添加风险标签的来源数据添加风险标签。
在其中一个实施例中,消息数据处理装置300,还可以包括:
身份标识提取模块,用于提取添加风险标签的来源数据对应的身份标识。
数量统计模块,用于统计预设时间段内身份标识对应的相同的风险等级大于预设等级的目标分词数据的数量。
监控模块,用于当数量超过预设值时,则对身份标识对应的消息数据进行监控。
在其中一个实施例种,消息数据处理装置300,还可以包括:
网络地址获取模块,用于根据身份标识获取身份标识对应的网络地址。
身份标识数量查询模块,用于查询网络地址在预设的注册时间段内注册的身份标识的数量。
标记模块,用于当所注册的身份标识的数量超过预设值时,则将网络地址标记为风险等级大于预设等级的网络地址。
关于消息数据处理装置的具体限定可以参见上文中对于消息数据处理方法的限定,在此不再赘述。上述消息数据处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是终端,其内部结构图可以如图4所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机可读指令。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种消息数据处理方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图4中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤::在主线程中接收服务器发送的消息数据,消息数据上携带有来源数据。将消息数据按照分词逻辑进行拆分得到目标分词数据。将目标分词数据与目标分词数据对应的来源数据进行关联存储,目标分词数据对应的来源数据与目标分词数据对应消息数据上携带的来源数据相同。在识别线程中查询目标分词数据中是否存在风险等级大于预设等级的目标分词数据。获取查询到的风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的来源数据添加风险标签。
在其中一个实施例中,处理器执行计算机可读指令时实现将消息数据按照预设分词逻辑进行拆分得到目标分词数据的步骤,可以包括:获取预设的多个分词逻辑,根据预设的多个分词逻辑对消息数据进行拆分得到分词序列。计算每一个分词序列对应的拆分正确 率。获取与最大的拆分正确率对应的分词序列作为目标分词数据。
在其中一个实施例中,处理器执行计算机可读指令时实现将消息数据按照分词逻辑进行拆分得到目标分词数据的步骤,还可以包括:将消息数据按照分词逻辑进行拆分得到初始分词数据。初始分词数据与基础滤镜库中的过滤数据进行匹配。当初始分词数据与过滤数据匹配成功时,则提取匹配成功的初始分词数据对应的来源数据中的时间数据。当初始分词数据对应的来源数据中的时间数据相同时,则获取时间数据对应的消息数据。接收针对消息数据的分词逻辑的添加指令,并根据添加指令添加新的分词逻辑。采用新的分词逻辑将消息数据进行拆分得到目标分词数据。
在其中一个实施例中,处理器执行计算机可读指令时实现在识别线程中从目标分词数据中查询风险等级大于预设等级的目标分词数据的步骤,可以包括:将目标分词数据与预存的风险等级大于预设等级的数据进行匹配。当至少一个目标分词数据匹配成功时,则获取匹配成功的风险等级大于预设等级的数据的存储类别。获取匹配成功的目标分词数据对应的来源数据中的时间数据,根据时间数据提取在预设的消息数据获取时间段内的未进行匹配的目标分词数据。将未进行匹配的目标分词数据与存储类别下预存的风险等级大于预设等级的数据进行匹配。当未进行匹配的目标分词数据与存储类别下预存的风险等级大于预设等级的数据匹配成功时,则未进行匹配的目标分词数据为风险等级大于预设等级的目标分词数据。
在其中一个实施例中,处理器执行计算机可读指令时实现将风险等级大于预设等级的目标分词数据对应的来源数据添加风险标签的步骤之后,可以包括:获取风险等级大于预设等级的目标分词数据的关联词组。当关联词组的风险等级大于预设等级时,则查询与未添加风险标签的来源数据对应的目标分词数据中是否存在关联词组。当与未添加风险标签的来源数据对应的目标分词数据中存在关联词组时,则将未添加风险标签的来源数据添加风险标签。
在其中一个实施例中,处理器执行计算机可读指令时实现将风险等级大于预设等级的目标分词数据对应的来源数据添加风险标签的步骤之后,还包括:提取添加风险标签的来源数据对应的身份标识。统计预设时间段内身份标识对应的相同的风险等级大于预设等级的目标分词数据的数量。当数量超过预设值时,则对身份标识对应的消息数据进行监控。
在其中一个实施例中,处理器执行计算机可读指令时实现当数量超过预设值时,则对身份标识对应的消息数据进行监控的步骤之后,还包括:根据身份标识获取身份标识对应的网络地址。查询网络地址在预设的注册时间段内注册的身份标识的数量。当所注册的身份标识的数量超过预设值时,则将网络地址标记为风险等级大于预设等级的网络地址。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤::在主线程中接收服务器发送的消息数据,消息数据上携带有来源数据。将消息数据按照分词逻辑进行拆分得到目标分词数据。将目标分词数据与目标分词数据对应的来源数据进行关联存储,目标分 词数据对应的来源数据与目标分词数据对应消息数据上携带的来源数据相同。在识别线程中查询目标分词数据中是否存在风险等级大于预设等级的目标分词数据。获取查询到的风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的来源数据添加风险标签。
在其中一个实施例中,计算机可读指令被处理器执行时实现将消息数据按照预设分词逻辑进行拆分得到目标分词数据的步骤,可以包括:获取预设的多个分词逻辑,根据预设的多个分词逻辑对消息数据进行拆分得到分词序列。计算每一个分词序列对应的拆分正确率。获取与最大的拆分正确率对应的分词序列作为目标分词数据。
在其中一个实施例中,计算机可读指令被处理器执行时实现将消息数据按照分词逻辑进行拆分得到目标分词数据的步骤,还可以包括:将消息数据按照分词逻辑进行拆分得到初始分词数据。初始分词数据与基础滤镜库中的过滤数据进行匹配。当初始分词数据与过滤数据匹配成功时,则提取匹配成功的初始分词数据对应的来源数据中的时间数据。当初始分词数据对应的来源数据中的时间数据相同时,则获取时间数据对应的消息数据。接收针对消息数据的分词逻辑的添加指令,并根据添加指令添加新的分词逻辑。采用新的分词逻辑将消息数据进行拆分得到目标分词数据。
在其中一个实施例中,计算机可读指令被处理器执行时实现在识别线程中从目标分词数据中查询风险等级大于预设等级的目标分词数据的步骤,可以包括:将目标分词数据与预存的风险等级大于预设等级的数据进行匹配。当至少一个目标分词数据匹配成功时,则获取匹配成功的风险等级大于预设等级的数据的存储类别。获取匹配成功的目标分词数据对应的来源数据中的时间数据,根据时间数据提取在预设的消息数据获取时间段内的未进行匹配的目标分词数据。将未进行匹配的目标分词数据与存储类别下预存的风险等级大于预设等级的数据进行匹配。当未进行匹配的目标分词数据与存储类别下预存的风险等级大于预设等级的数据匹配成功时,则未进行匹配的目标分词数据为风险等级大于预设等级的目标分词数据。
在其中一个实施例中,计算机可读指令被处理器执行时实现将风险等级大于预设等级的目标分词数据对应的来源数据添加风险标签的步骤之后,可以包括:获取风险等级大于预设等级的目标分词数据的关联词组。当关联词组的风险等级大于预设等级时,则查询与未添加风险标签的来源数据对应的目标分词数据中是否存在关联词组。当与未添加风险标签的来源数据对应的目标分词数据中存在关联词组时,则将未添加风险标签的来源数据添加风险标签。
在其中一个实施例中,计算机可读指令被处理器执行时实现将风险等级大于预设等级的目标分词数据对应的来源数据添加风险标签的步骤之后,还包括:提取添加风险标签的来源数据对应的身份标识。统计预设时间段内身份标识对应的相同的风险等级大于预设等级的目标分词数据的数量。当数量超过预设值时,则对身份标识对应的消息数据进行监控。
在其中一个实施例中,计算机可读指令被处理器执行时实现当数量超过预设值时,则 对身份标识对应的消息数据进行监控的步骤之后,还包括:根据身份标识获取身份标识对应的网络地址。查询网络地址在预设的注册时间段内注册的身份标识的数量。当所注册的身份标识的数量超过预设值时,则将网络地址标记为风险等级大于预设等级的网络地址。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种消息数据处理方法,包括:
    在主线程中接收服务器发送的消息数据,所述消息数据上携带有来源数据;
    将所述消息数据按照分词逻辑进行拆分得到目标分词数据;
    将所述目标分词数据与所述目标分词数据对应的来源数据进行关联存储,所述目标分词数据对应的来源数据与所述目标分词数据对应消息数据上携带的来源数据相同;
    在识别线程中查询所述目标分词数据中是否存在风险等级大于预设等级的目标分词数据;及
    获取查询到的所述风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的所述来源数据添加风险标签。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述消息数据按照预设分词逻辑进行拆分得到目标分词数据,包括:
    获取预设的多个分词逻辑,根据所述预设的多个分词逻辑对所述消息数据进行拆分得到分词序列;
    计算每一个所述分词序列对应的拆分正确率;及
    获取与最大的所述拆分正确率对应的所述分词序列作为所述目标分词数据。
  3. 根据权利要求1所述的方法,其特征在于,所述将所述消息数据按照分词逻辑进行拆分得到目标分词数据,还包括:
    将所述消息数据按照分词逻辑进行拆分得到初始分词数据;
    将所述初始分词数据与基础滤镜库中的过滤数据进行匹配;
    当所述初始分词数据与所述过滤数据匹配成功时,则提取所述匹配成功的所述初始分词数据对应的所述来源数据中的时间数据;
    当所述初始分词数据对应的来源数据中的时间数据相同时,则获取所述时间数据对应的消息数据;接收针对所述消息数据的分词逻辑的添加指令,并根据所述添加指令添加新的分词逻辑;及
    采用所述新的分词逻辑将所述消息数据进行拆分得到目标分词数据。
  4. 根据权利要求1所述的方法,其特征在于,所述在识别线程中从所述目标分词数据中查询风险等级大于预设等级的目标分词数据,包括:
    将所述目标分词数据与预存的风险等级大于预设等级的数据进行匹配;
    当至少一个所述目标分词数据匹配成功时,则获取匹配成功的所述风险等级大于预设等级的数据的存储类别;
    获取匹配成功的所述目标分词数据对应的来源数据中的时间数据,根据所述时间数据提取在预设的消息数据获取时间段内的未进行匹配的所述目标分词数据;
    将未进行匹配的所述目标分词数据与所述存储类别下预存的风险等级大于预设等级的数据进行匹配;及
    当所述未进行匹配的目标分词数据与所述存储类别下预存的风险等级大于预设等级的数据匹配成功时,则所述未进行匹配的目标分词数据为所述风险等级大于预设等级的目标分词数据。
  5. 根据权利要求1所述的方法,其特征在于,所述将所述风险等级大于预设等级的目标分词数据对应的来源数据添加风险标签之后,包括:
    获取所述风险等级大于预设等级的目标分词数据的关联词组;
    当所述关联词组的风险等级大于预设等级时,则查询与未添加风险标签的来源数据对应的目标分词数据中是否存在所述关联词组;及
    当与未添加风险标签的来源数据对应的目标分词数据中存在所述关联词组时,则将所述未添加风险标签的来源数据添加所述风险标签。
  6. 根据权利要求1所述的方法,其特征在于,所述将所述风险等级大于预设等级的目标分词数据对应的来源数据添加风险标签之后,还包括:
    提取所述添加风险标签的来源数据对应的身份标识;
    统计预设时间段内所述身份标识对应的相同的风险等级大于预设等级的目标分词数据的数量;及
    当所述数量超过预设值时,则对所述身份标识对应的消息数据进行监控。
  7. 根据权利要求6所述的方法,其特征在于,所述当所述数量超过预设值时,则对所述身份标识对应的消息数据进行监控之后,还包括:
    根据所述身份标识获取所述身份标识对应的网络地址;
    查询所述网络地址在预设的注册时间段内注册的身份标识的数量;及
    当所注册的身份标识的数量超过预设值时,则将所述网络地址标记为风险等级大于预设等级的网络地址。
  8. 一种消息数据处理装置,包括:
    接收模块,用于在主线程中接收服务器发送的消息数据,所述消息数据上携带有来源数据;
    拆分模块,用于将所述消息数据按照分词逻辑进行拆分得到目标分词数据;
    关联存储模块,用于将所述目标分词数据与所述目标分词数据对应的来源数据进行关联存储,所述目标分词数据对应的来源数据与所述目标分词数据对应消息数据上携带的来源数据相同;
    查询模块,用于在识别线程中查询所述目标分词数据中是否存在风险等级大于预设等级的目标分词数据;及
    获取模块,用于获取查询到的所述风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的所述来源数据添加风险标签。
  9. 根据权利要求8所述的装置,其特征在于,所述拆分模块包括:
    第一拆分单元,用于获取预设的多个分词逻辑,根据所述预设的多个分词逻辑对所述消息数据进行拆分得到分词序列;
    计算单元,用于计算每一个所述分词序列对应的拆分正确率;及
    计算单元,用于获取与最大的所述拆分正确率对应的所述分词序列作为所述目标分词 数据。
  10. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    在主线程中接收服务器发送的消息数据,所述消息数据上携带有来源数据;
    将所述消息数据按照分词逻辑进行拆分得到目标分词数据;
    将所述目标分词数据与所述目标分词数据对应的来源数据进行关联存储,所述目标分词数据对应的来源数据与所述目标分词数据对应消息数据上携带的来源数据相同;
    在识别线程中查询所述目标分词数据中是否存在风险等级大于预设等级的目标分词数据;及
    获取查询到的所述风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的所述来源数据添加风险标签。
  11. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时实现所述将所述消息数据按照预设分词逻辑进行拆分得到目标分词数据,包括:
    获取预设的多个分词逻辑,根据所述预设的多个分词逻辑对所述消息数据进行拆分得到分词序列;
    计算每一个所述分词序列对应的拆分正确率;及
    获取与最大的所述拆分正确率对应的所述分词序列作为所述目标分词数据。
  12. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时实现所述将所述消息数据按照分词逻辑进行拆分得到目标分词数据,还包括:
    将所述消息数据按照分词逻辑进行拆分得到初始分词数据;
    将所述初始分词数据与基础滤镜库中的过滤数据进行匹配;
    当所述初始分词数据与所述过滤数据匹配成功时,则提取所述匹配成功的所述初始分词数据对应的所述来源数据中的时间数据;
    当所述初始分词数据对应的来源数据中的时间数据相同时,则获取所述时间数据对应的消息数据;接收针对所述消息数据的分词逻辑的添加指令,并根据所述添加指令添加新的分词逻辑;及
    采用所述新的分词逻辑将所述消息数据进行拆分得到目标分词数据。
  13. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时实现所述在识别线程中从所述目标分词数据中查询风险等级大于预设等级的目标分词数据,包括:
    将所述目标分词数据与预存的风险等级大于预设等级的数据进行匹配;
    当至少一个所述目标分词数据匹配成功时,则获取匹配成功的所述风险等级大于预设等级的数据的存储类别;
    获取匹配成功的所述目标分词数据对应的来源数据中的时间数据,根据所述时间数据提取在预设的消息数据获取时间段内的未进行匹配的所述目标分词数据;
    将未进行匹配的所述目标分词数据与所述存储类别下预存的风险等级大于预设等级的数据进行匹配;及
    当所述未进行匹配的目标分词数据与所述存储类别下预存的风险等级大于预设等级的数据匹配成功时,则所述未进行匹配的目标分词数据为所述风险等级大于预设等级的目标分词数据。
  14. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时实现所述将所述风险等级大于预设等级的目标分词数据对应的来源数据添加风险标签之后,包括:
    获取所述风险等级大于预设等级的目标分词数据的关联词组;
    当所述关联词组的风险等级大于预设等级时,则查询与未添加风险标签的来源数据对应的目标分词数据中是否存在所述关联词组;及
    当与未添加风险标签的来源数据对应的目标分词数据中存在所述关联词组时,则将所述未添加风险标签的来源数据添加所述风险标签。
  15. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时实现所述将所述风险等级大于预设等级的目标分词数据对应的来源数据添加风险标签之后,还包括:
    提取所述添加风险标签的来源数据对应的身份标识;
    统计预设时间段内所述身份标识对应的相同的风险等级大于预设等级的目标分词数据的数量;及
    当所述数量超过预设值时,则对所述身份标识对应的消息数据进行监控。
  16. 根据权利要求15所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时实现所述当所述数量超过预设值时,则对所述身份标识对应的消息数据进行监控之后,还包括:
    根据所述身份标识获取所述身份标识对应的网络地址;
    查询所述网络地址在预设的注册时间段内注册的身份标识的数量;及
    当所注册的身份标识的数量超过预设值时,则将所述网络地址标记为风险等级大于预设等级的网络地址。
  17. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    在主线程中接收服务器发送的消息数据,所述消息数据上携带有来源数据;
    将所述消息数据按照分词逻辑进行拆分得到目标分词数据;
    将所述目标分词数据与所述目标分词数据对应的来源数据进行关联存储,所述目标分词数据对应的来源数据与所述目标分词数据对应消息数据上携带的来源数据相同;
    在识别线程中查询所述目标分词数据中是否存在风险等级大于预设等级的目标分词数据;及
    获取查询到的所述风险等级大于预设等级的目标分词数据对应的来源数据,并对所获取到的所述来源数据添加风险标签。
  18. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时实现所述将所述消息数据按照预设分词逻辑进行拆分得到目标分词数据,包括:
    获取预设的多个分词逻辑,根据所述预设的多个分词逻辑对所述消息数据进行拆分得到分词序列;
    计算每一个所述分词序列对应的拆分正确率;及
    获取与最大的所述拆分正确率对应的所述分词序列作为所述目标分词数据。
  19. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时实现所述将所述消息数据按照分词逻辑进行拆分得到目标分词数据,还包括:
    将所述消息数据按照分词逻辑进行拆分得到初始分词数据;
    将所述初始分词数据与基础滤镜库中的过滤数据进行匹配;
    当所述初始分词数据与所述过滤数据匹配成功时,则提取所述匹配成功的所述初始分词数据对应的所述来源数据中的时间数据;
    当所述初始分词数据对应的来源数据中的时间数据相同时,则获取所述时间数据对应的消息数据;接收针对所述消息数据的分词逻辑的添加指令,并根据所述添加指令添加新的分词逻辑;及
    采用所述新的分词逻辑将所述消息数据进行拆分得到目标分词数据。
  20. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时实现所述在识别线程中从所述目标分词数据中查询风险等级大于预设等级的目标分词数据,包括:
    将所述目标分词数据与预存的风险等级大于预设等级的数据进行匹配;
    当至少一个所述目标分词数据匹配成功时,则获取匹配成功的所述风险等级大于预设等级的数据的存储类别;
    获取匹配成功的所述目标分词数据对应的来源数据中的时间数据,根据所述时间数据提取在预设的消息数据获取时间段内的未进行匹配的所述目标分词数据;
    将未进行匹配的所述目标分词数据与所述存储类别下预存的风险等级大于预设等级的数据进行匹配;及
    当所述未进行匹配的目标分词数据与所述存储类别下预存的风险等级大于预设等级的数据匹配成功时,则所述未进行匹配的目标分词数据为所述风险等级大于预设等级的目标分词数据。
PCT/CN2018/089068 2018-02-07 2018-05-30 消息数据处理方法、装置、计算机设备和存储介质 WO2019153589A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810124547.0 2018-02-07
CN201810124547.0A CN108287823B (zh) 2018-02-07 2018-02-07 消息数据处理方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2019153589A1 true WO2019153589A1 (zh) 2019-08-15

Family

ID=62832600

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/089068 WO2019153589A1 (zh) 2018-02-07 2018-05-30 消息数据处理方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN108287823B (zh)
WO (1) WO2019153589A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109192258B (zh) * 2018-08-14 2023-06-20 深圳平安医疗健康科技服务有限公司 医疗数据转化方法、装置、计算机设备和存储介质
CN109408532B (zh) * 2018-09-26 2023-12-19 平安科技(深圳)有限公司 数据获取方法、装置、计算机设备和存储介质
CN109543177B (zh) * 2018-10-19 2022-04-12 中国平安人寿保险股份有限公司 消息数据处理方法、装置、计算机设备和存储介质
CN112015999A (zh) * 2019-05-13 2020-12-01 阿里巴巴集团控股有限公司 风险提示方法、信息提示方法、计算机设备、存储介质
CN111062193B (zh) * 2019-12-16 2023-04-25 医渡云(北京)技术有限公司 医疗数据标注方法及装置、存储介质、电子设备
CN114500123B (zh) * 2022-04-18 2022-08-02 远江盛邦(北京)网络安全科技股份有限公司 网络情报分析方法及装置
CN116390167B (zh) * 2023-04-24 2024-10-11 零束科技有限公司 车联网消息处理方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339375A (zh) * 2015-07-06 2017-01-18 阿里巴巴集团控股有限公司 网页上项目的评价信息展示方法和装置
CN106685797A (zh) * 2016-07-09 2017-05-17 东莞市华睿电子科技有限公司 一种基于即时通信的身份认证方法
CN107257314A (zh) * 2017-06-05 2017-10-17 成都知道创宇信息技术有限公司 一种基于微信群的消息统计分析方法
CN107395488A (zh) * 2017-06-08 2017-11-24 深圳市金立通信设备有限公司 一种识别风险账户的方法及终端

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103401835A (zh) * 2013-07-01 2013-11-20 北京奇虎科技有限公司 一种展现微博页面的安全检测结果的方法及装置
CN106600275B (zh) * 2015-10-14 2020-08-21 阿里巴巴集团控股有限公司 一种风险识别方法及装置
CN106874253A (zh) * 2015-12-11 2017-06-20 腾讯科技(深圳)有限公司 识别敏感信息的方法及装置
US20170243125A1 (en) * 2016-02-24 2017-08-24 Sprinklr, Inc. Bayesian classification algorithm modification for sentiment estimation
CN106095988A (zh) * 2016-06-21 2016-11-09 上海智臻智能网络科技股份有限公司 自动问答方法及装置
CN106055541B (zh) * 2016-06-29 2018-12-28 清华大学 一种新闻内容敏感词过滤方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339375A (zh) * 2015-07-06 2017-01-18 阿里巴巴集团控股有限公司 网页上项目的评价信息展示方法和装置
CN106685797A (zh) * 2016-07-09 2017-05-17 东莞市华睿电子科技有限公司 一种基于即时通信的身份认证方法
CN107257314A (zh) * 2017-06-05 2017-10-17 成都知道创宇信息技术有限公司 一种基于微信群的消息统计分析方法
CN107395488A (zh) * 2017-06-08 2017-11-24 深圳市金立通信设备有限公司 一种识别风险账户的方法及终端

Also Published As

Publication number Publication date
CN108287823A (zh) 2018-07-17
CN108287823B (zh) 2021-06-29

Similar Documents

Publication Publication Date Title
WO2019153589A1 (zh) 消息数据处理方法、装置、计算机设备和存储介质
CN109670049B (zh) 图谱路径查询方法、装置、计算机设备和存储介质
CN112613917B (zh) 基于用户画像的信息推送方法、装置、设备及存储介质
CN110442712B (zh) 风险的确定方法、装置、服务器和文本审理系统
WO2020125463A1 (zh) 报表生成方法、装置、计算机设备及可读存储介质
US10878121B2 (en) Method and device for converting data containing user identity
WO2021082484A1 (zh) Awr报告自动获取方法、装置、电子设备及存储介质
WO2019041925A1 (zh) 工作流数据处理方法、装置、存储介质和计算机设备
WO2019061664A1 (zh) 电子装置、基于用户上网数据的产品推荐方法及存储介质
CN111179066A (zh) 业务数据的批量处理方法、装置、服务器和存储介质
CN110851298A (zh) 异常分析及处理方法、电子装置及存储介质
WO2016188334A1 (zh) 一种用于处理应用访问数据的方法与设备
WO2017107679A1 (zh) 一种历史信息展示方法及装置
CN114493255A (zh) 基于知识图谱的企业异常监控方法及其相关设备
US9665574B1 (en) Automatically scraping and adding contact information
CN113420057A (zh) 对账数据处理方法及相关装置
TW202111592A (zh) 學習模型應用系統、學習模型應用方法及程式產品
CN118396786A (zh) 合同文档审核方法和装置、电子设备及计算机可读存储介质
CN110688827A (zh) 数据处理方法、装置、电子设备及存储介质
WO2019153586A1 (zh) 聊天数据处理方法、装置、计算机设备及存储介质
WO2019095569A1 (zh) 基于微博财经事件的金融分析方法、应用服务器及计算机可读存储介质
WO2021036681A1 (zh) 数据验证方法、装置、计算机设备和存储介质
US20150073902A1 (en) Financial Transaction Analytics
TWI668657B (zh) Business processing method and device
CN115391655A (zh) 信息查询方法及装置、电子设备和计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18905540

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 29/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18905540

Country of ref document: EP

Kind code of ref document: A1