CN112131377A - Multi-strategy-based group chat topic detection method, device, equipment and storage medium - Google Patents

Multi-strategy-based group chat topic detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN112131377A
CN112131377A CN202010808203.9A CN202010808203A CN112131377A CN 112131377 A CN112131377 A CN 112131377A CN 202010808203 A CN202010808203 A CN 202010808203A CN 112131377 A CN112131377 A CN 112131377A
Authority
CN
China
Prior art keywords
topic
group chat
message
topics
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010808203.9A
Other languages
Chinese (zh)
Inventor
吴旭
吴京宸
颉夏青
陈春旭
方滨兴
张勇东
邱莉榕
杨金翠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202010808203.9A priority Critical patent/CN112131377A/en
Publication of CN112131377A publication Critical patent/CN112131377A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a multi-strategy-based group chat topic detection method, a device, equipment and a storage medium, wherein the method comprises the following steps: obtaining a topic sequence according to attribute information of the topic, wherein the topic sequence comprises a current topic and an expired topic, the current topic comprises a common topic and a hot topic, and the group chat message is added into the corresponding topic according to matching of text characteristic information and auxiliary information of the group chat message with the current topic; calculating first similarity among the topics, and merging the topics when the first similarity is larger than or equal to a preset first threshold. According to the group chat topic detection method disclosed by the invention, the problem of topic crossing is solved by constructing the topic sequence, the influence of short text feature sparsity on the clustering effect is reduced by utilizing auxiliary information, and the efficiency and the accuracy of group chat topic detection are greatly improved.

Description

Multi-strategy-based group chat topic detection method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of natural language processing, in particular to a multi-strategy-based group chat topic detection method, device, equipment and storage medium.
Background
At present, the popularity of the internet and its supporting devices is increasing, the user group of the instant messaging system is also continuously growing, domestic net citizens have gradually been unable to leave the instant messaging tools such as QQ, wechat, etc., and group chat, as the main functions of these applications, must appear in people's daily life more and more frequently and take a place. For a common user, a large amount of complex group chat messages are difficult to be digested quickly, and the above characteristics of the group chat messages also make the supervision of the group chat messages consume a large amount of manpower to achieve a more ideal effect. To solve the two main contradictions, a computer is needed to perform public sentiment analysis on the group chat, and topic detection is an important research direction of text analysis of the group chat.
Due to the characteristics of feature sparsity, singularity, dynamics, staggering and the like of the group chat text, although topic detection for group chat has been developed for years, a great promotion space still exists. In addition, the existing research often ignores the processing of messages of non-text types and messages only containing stop words or punctuation marks, and in practical application scenarios, a large amount of messages are omitted, so that conversation segments cannot be accurately divided and user groups participating in topics cannot be positioned, user portrayal cannot be well performed according to chat records, user gender analysis and the like cannot be well performed, the attention to part of key non-text information contents is easily lost, and the accuracy and the reliability of the public sentiment analysis result are insufficient.
Disclosure of Invention
The embodiment of the disclosure provides a multi-strategy-based group chat topic detection method, device, equipment and storage medium. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present disclosure provides a group chat topic detection method based on multiple policies, including:
obtaining a topic sequence according to attribute information of the topic, wherein the topic sequence comprises a current topic and an expired topic;
matching the text characteristic information and the auxiliary information of the group chat message with the current topic, and adding the group chat message into the corresponding topic;
calculating first similarity among the topics, and merging the topics when the first similarity is larger than or equal to a preset first threshold.
Optionally, the attribute information of the topic includes a duration of the topic, a popularity of the topic, and a message frequency within a preset time period.
Optionally, the current topics include general topics and hot topics.
Optionally, matching the current topic according to the text feature information and the auxiliary information of the group chat message, including:
calculating a second similarity between the text characteristic information of the group chat message and the current topic;
when the second similarity is larger than or equal to a preset second threshold value, determining that the group chat message is matched with the corresponding topic;
and when the second similarity is smaller than a preset second threshold value, matching with the current topic according to the auxiliary information.
Optionally, matching with the current topic according to the auxiliary information includes:
and matching with the current topic according to the type, time and user attribute of the group chat message.
Optionally, when the group chat message does not match each topic in the current topics, further comprising:
judging whether the group chat message is an initial message;
when the group chat message is an initial message, starting a new topic;
and when the group chat message is not the initial message, adding the group chat message to the topic of the last message.
Optionally, the group chat message comprises one or more of text, a picture, a video, a link.
In a second aspect, an embodiment of the present disclosure provides a group chat topic detection apparatus based on multiple policies, including:
the topic sequence module is used for obtaining a topic sequence according to the attribute information of the topic, wherein the topic sequence comprises a current topic and an expired topic;
the matching module is used for matching the current topic according to the text characteristic information and the auxiliary information of the group chat message and adding the group chat message into the corresponding topic;
and the similarity calculation module is used for calculating first similarity among the topics, and merging the topics when the first similarity is greater than or equal to a preset first threshold value.
In a third aspect, the disclosed embodiments provide a multi-policy-based group chat topic detection device, including a processor and a memory storing program instructions, where the processor is configured to execute the multi-policy-based group chat topic detection method provided in the above embodiments when executing the program instructions.
In a fourth aspect, the present disclosure provides a computer-readable medium, on which computer-readable instructions are stored, where the computer-readable instructions are executable by a processor to implement a multi-policy-based group chat topic detection method provided by the foregoing embodiments.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
according to the multi-strategy-based group chat topic detection method provided by the embodiment of the disclosure, the topic sequence is obtained according to the attribute information of the topic, the topic sequence is constructed to classify the topic, so that the current topic and the overdue topic are obtained, and the group chat message is matched with the current topic in the topic sequence, so that the matching accuracy can be improved, and the calculated amount can be reduced. The influence of the short text characteristic sparseness on the clustering effect is reduced by using auxiliary information of users, time, types and the like of the group chat messages in the clustering process, and the method can not only process various types of messages, but also greatly improve the efficiency and accuracy of group chat topic detection.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flowchart illustrating a multi-policy based group chat topic detection method according to an exemplary embodiment;
FIG. 2 is a flowchart illustrating a multi-policy based group chat topic detection method according to an exemplary embodiment;
FIG. 3 is a schematic diagram illustrating a topic sequence in accordance with an illustrative embodiment;
fig. 4 is a schematic structural diagram illustrating a multi-policy based group chat topic detection apparatus according to an exemplary embodiment;
FIG. 5 is a schematic block diagram illustrating a multi-policy based group chat topic detection apparatus in accordance with an exemplary embodiment;
FIG. 6 is a schematic diagram illustrating a computer storage medium in accordance with an exemplary embodiment.
Detailed Description
So that the manner in which the features and elements of the disclosed embodiments can be understood in detail, a more particular description of the disclosed embodiments, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. In the following description of the technology, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may be practiced without these details. In other instances, well-known structures and devices may be shown in simplified form in order to simplify the drawing.
At present, a large amount of group chat messages exist in the internet, public opinion analysis needs to be performed on group chat data, messages related to discussion contents need to be aggregated together, namely topic detection is performed on the messages, user groups participating in discussion are locked, research shows that the group chat topics are easy to generate a cross parallel phenomenon, the content of the group chat messages also has the characteristic of sparse text features, the topic detection performance is difficult to improve by simply depending on semantic information, and the continuity of group chat records and the completeness of topic participation groups are guaranteed to be of great importance to the group chat public opinion analysis. In order to solve the problems, the application provides a multi-strategy-based group chat topic detection method, on one hand, a topic sequence is constructed to de-entangle cross-parallel topics to a certain extent, and on the other hand, the defect that the short text characteristics are only relied on is made up by using auxiliary information of users, time, types and the like of group chat messages in a clustering process.
The multi-policy based group chat topic detection method provided by the embodiment of the present application will be described in detail below with reference to fig. 1 to fig. 3. The method may be implemented in dependence on a computer program, operable on a data transmission device based on the von neumann architecture. The computer program may be integrated into the application or may run as a separate tool-like application.
Referring to fig. 1, a schematic flowchart of a method for detecting a group chat topic based on multiple policies provided in an embodiment of the present application is shown in fig. 1, where the method includes the following steps:
s101, obtaining a topic sequence according to attribute information of the topic, wherein the topic sequence comprises a current topic and an expired topic.
In an actual chat scene, the root cause of the difference between the group chat and the private chat is the number of participants, under the scene of point-to-point chat, the generation of one topic is inevitably accompanied with the extinction of the previous topic, the number of group members is large, and if the participation groups of two topics are not completely overlapped, the possibility of cross and parallel exists, namely, in a section of continuously occurring messages, messages belonging to different topics can alternately appear. Group chat topic detection must be performed to accurately determine whether a new message belongs to an existing topic and to which topic it belongs.
According to the group chat topic detection method provided by the embodiment of the disclosure, the topic sequence is obtained by using the attribute information of the topic, the topic is classified by constructing the topic sequence to obtain the current topic and the expired topic, and the group chat message is matched with the current topic in the topic sequence, so that not only can the matching accuracy be improved, but also the calculation amount can be reduced, the new message can be matched with the topic to which the new message belongs with higher probability, and the purpose of solving the problem of topic cross and parallel is achieved.
Specifically, the attribute information of the topic includes a duration of the topic, a degree of heat of the topic, and a message frequency within a preset time period.
The topic duration is calculated as follows:
t.duration=mrecent.time-minmi∈t(mi.time)
wherein, t.duration represents the duration of topic t, mrecentTime represents the time of the latest group chat message, minmi∈t(miTime) represents the time of the earliest message in topic t.
The message frequency in the preset time period is calculated in the following mode:
Figure BDA0002629940360000051
mrecent.time-mj.time<Ht
wherein t.frequency represents the message frequency of the topic t within a preset time period, HtIndicating a preset time period, which can be set by a person skilled in the art, the embodiments of the present disclosure are not particularly limited,
Figure BDA0002629940360000052
indicating the nearest H belonging to topic ttNumber of messages in time.
The calculation mode of the topic popularity comprises the following steps:
Figure BDA0002629940360000053
where, t.temporal denotes the heat of the topic t, which is inversely proportional to the topic duration and directly proportional to the message frequency of the topic in the recent period of time.
And obtaining a topic sequence according to the duration of the topic, the heat of the topic and the message frequency in a preset time period. Fig. 3 is a schematic diagram of a topic sequence, and as shown in fig. 3, there are two types of topics in the topic sequence, namely a current topic and an expired topic. The expired topics are the topics which are eliminated from the current topic sequence and are not updated any more, new messages cannot be added to the topics, and the new messages are stored as historical data; the current topic sequence is divided into a common topic sequence and a hot topic sequence, and a new message is added to the topics in the two sequences or a new topic is started in the two sequences.
The topic sequence is constructed to classify the topics, so that the current topic and the overdue topic are obtained, the current topic is a topic which is high in heat degree and is in progress, the group chat message is only matched with the current topic in the topic sequence, the matching accuracy can be improved, and the calculation amount is reduced.
The updating mechanism algorithm of the topic sequence is as follows:
inputting: s ═ t1,t2,…]: a current topic sequence; hf: hot spot determination frequency of a topic; seqsize: sequence of common topicsSize.
And (3) outputting: and (3) hTopics: a sequence of hot topics; cTopics: a sequence of common topics; eTopics: an expired topic sequence.
Figure BDA0002629940360000061
And S102, matching the text characteristic information and the auxiliary information of the group chat message with the current topic, and adding the group chat message into the corresponding topic.
In some exemplary scenarios, because the group chat message has a short text length and a characteristic of sparse features, it is difficult to improve the topic detection performance by simply relying on semantic information, and a large amount of text information consisting of pure symbols or stop words, and non-text information such as pictures, videos, links, and the like are often doped in the group chat. The information is important for ensuring the continuity of the group chat records and the completeness of topic participation groups, and therefore is also important for detecting the group chat topics.
Further, since the group chat message has a transmission time stamp and a transmission user ID, it is found that a message transmitted by one user is likely to belong to the same topic within a certain period of time, and therefore, the time information of the message and the user information are also important for detecting the group chat topic.
In a possible implementation manner, the type information, the time information and the user information of the message are used as auxiliary information together, matching is performed according to the text feature information and the auxiliary information of the group chat message and a current topic in the topic sequence, and the group chat message is added into the corresponding topic.
Specifically, according to a Single-Pass clustering algorithm capable of efficiently processing streaming data, a topic sequence and auxiliary information are combined, and group chat messages are matched with current topics. In the clustering process, semantic similarity calculation mainly occurs between messages and topics and between topics, wherein the latter is to solve the problem that a Single-Pass clustering algorithm is easy to form small clusters, namely, the problem that topic segmentation granularity is too small due to sparse short text features is avoided.
Specifically, first, second similarity between text feature information of the group chat message and each current topic is calculated, when the second similarity is larger than or equal to a preset second threshold, the group chat message is determined to be matched with the corresponding topic, and the group chat message is added into the corresponding topic. And when the second similarity is smaller than a preset second threshold value, matching with the current topic in the topic sequence according to the auxiliary information. The preset second threshold may be set by a person skilled in the art, and the embodiment of the present disclosure is not limited in particular.
The calculation method of the second similarity comprises the following steps:
Figure BDA0002629940360000071
wherein, sim (x)2,tl) Representing an object x2With topic tlSemantic similarity between, in this step, x2Indicating group chat information, tlRepresenting a plurality of topics m1,m2,m3……mv2(ii) a Vector represents the text feature vector of message m, and cossim () is a function that calculates the cosine similarity between two vectors, in the following way:
Figure BDA0002629940360000072
from this step, a second similarity between the group chat message and the topic may be calculated.
In a possible implementation manner, if the similarity between the group chat message and each topic in the topic sequence is less than a preset second threshold, matching is performed according to the auxiliary information of the message. According to the principle that the messages sent by one user most probably belong to the same topic within a certain time, the topic heat detection time Ht is taken, and when a new message is a nonsense message or the text content is not similar to any topic enough, if the message sent by the same user within the Ht time is found, the new message is added into the topic to which the last message sent by the user belongs. The nonsense message refers to a message doped with a large amount of text information consisting of pure symbols or stop words, and non-text information such as pictures, videos, links and the like.
If the appropriate topic can not be matched according to the auxiliary information, whether the group chat message is an initial message or not is judged, and if the group chat message is an initial message, a new topic is opened. It is also quite possible for a non-text type group chat message to elicit a new topic. Through the research on a large number of group chat record samples, a topic may be caused by a picture, a video and a link besides a text, especially when the user who sends the message does not participate in the chat in the recent period of time, therefore, the disclosed embodiment adds the attribute of "whether the message is the initial message" to each message. The calculation is shown as follows:
Figure BDA0002629940360000081
where m.start represents whether the message m is a flag of a start message or not, and m.type represents a type of the message m.
Optionally, when the group chat message is a text message, in combination with the habit of Chinese grammar, the embodiments of the present disclosure summarize several text features that can be used to determine the non-initial message, where the beginning of the sentence is a conjunctive word or adverb, the end of the sentence is a specific language word such as "bar", and the like, and the sentence contains the indicated pronouns of the characters such as "you", "he", "this", and the like, and does not contain pronouns and nouns. In the clustering process, when the text content has one of the above features, the message is not the initial message.
In some exemplary scenarios, the group chat message is "where you say the store, the sentence contains" you "," this ", etc. indicative pronouns, and the sentence is determined not to be the initial message.
In some exemplary scenarios, the group chat message is "his home is a bar living near the supermarket," the sentence contains the "he" indication pronoun, and the "bar" special tone word, the sentence is determined not to be the initial message.
Further, when the group chat message is determined not to belong to the initial message, the message is added to the topic of the last message.
According to the step, the defect that matching is carried out only by means of short text features is made up by using auxiliary information such as user information, time information, type information and the like of the group chat messages in the clustering process, and the accuracy of topic detection is greatly improved.
S103, calculating first similarity among the topics, and merging the topics when the first similarity is larger than or equal to a preset first threshold value.
Specifically, the topic detection method of the embodiment of the disclosure further includes calculating similarity between topics, and by combining the topics with higher similarity, the problem that a Single-Pass clustering algorithm is prone to forming small clusters can be solved, that is, the topic segmentation granularity is prevented from being too small due to the sparse short text features.
The method for calculating the first similarity comprises the following steps:
Figure BDA0002629940360000091
wherein, sim (x)1,tl) Representing an object x1With topic tlSemantic similarity between, in this step, x1Representing multiple topics
Figure BDA0002629940360000093
tlRepresenting a plurality of topics m1,m2,m3……mv2(ii) a Vector represents the text feature vector of message m, and cossim () is a function that calculates the cosine similarity between two vectors, in the following way:
Figure BDA0002629940360000092
according to the step, the similarity between the two topics can be calculated, and the two topics with the similarity greater than the preset first threshold are merged, wherein the preset first threshold can be set by a person skilled in the art, and the embodiment of the present disclosure is not limited specifically.
Further, after merging topics or adding new topics, the method further includes updating the topic sequence, and the specific updating method is as shown in step S101, and is not described in detail here, so as to obtain the current hot topic, the common topic, and the expired topic.
In order to facilitate understanding of the multi-policy based group chat topic detection method provided in the embodiment of the present application, the following description is made with reference to fig. 2. As shown in fig. 2, a method for detecting a group chat topic based on multiple policies includes:
s201, a topic sequence is obtained according to the attribute information of the topic.
S202 calculates a second similarity between the text feature information of the group chat message and the current topic in the topic sequence.
S203 determines whether the second similarity is greater than or equal to a preset second threshold, and if so, executes step S204 to determine that the group chat message matches the corresponding topic, and adds the group chat message to the corresponding topic. And when the number of the messages is smaller than the preset second threshold, executing step S205, and determining whether the group chat message matches the corresponding topic according to the auxiliary information.
S204 determines that the group chat message matches the corresponding topic.
S205 determines whether the group chat message matches the corresponding topic according to the auxiliary information, and if the group chat message matches the corresponding topic, executes step S204 to determine that the group chat message matches the corresponding topic, and adds the group chat message to the corresponding topic. When the topic is not matched with the corresponding topic, step S206 is executed to determine whether the group chat message is an initial message.
S206 determines whether the group chat message is the initial message, if so, executes step S208 to open a new topic, and if not, executes step S207 to add the group chat message to the topic of the previous message.
S207 adds the group chat message to the topic of the last message.
S208 opens a new topic.
S209 calculates a first similarity between the topics.
S210 determines whether the first similarity is greater than or equal to a preset first threshold, if so, performs step S211, merges topics, and if not, performs step S212, and updates the topic sequence.
S211 merges topics.
S212, updating the topic sequence to obtain the current overdue topic, the common topic and the hot topic.
According to the group chat topic detection method provided by the embodiment of the disclosure, the topic sequence is constructed to de-entangle the cross and parallel topics to a certain extent, the matching accuracy of the topics is improved, and the influence of the short text characteristic sparsity on the clustering effect is reduced by using auxiliary information of users, time, types and the like of the group chat messages in the clustering process.
In a second aspect, an embodiment of the present disclosure provides a multi-policy-based group chat topic detection apparatus, and fig. 4 is a schematic structural diagram of the multi-policy-based group chat topic detection apparatus proposed in the present application, and as shown in fig. 4, the apparatus includes:
the topic sorting module 401 is configured to obtain a topic sequence according to attribute information of a topic, where the topic sequence includes a current topic and an expired topic;
a matching module 402, configured to match the text feature information and the auxiliary information of the group chat message with a current topic in the topic sequence, and add the group chat message to the corresponding topic;
the similarity calculation module 403 is configured to calculate a first similarity between the topics, and merge the topics when the first similarity is greater than or equal to a preset first threshold.
Optionally, the attribute information of the topic includes a duration of the topic, a popularity of the topic, and a message frequency within a preset time period.
Optionally, the current topics include general topics and hot topics.
Optionally, the matching module 402 in the embodiment of the present disclosure may further include:
the calculating unit is used for calculating a second similarity between the text characteristic information of the group chat message and the current topic;
the first matching unit is used for determining that the group chat message is matched with the corresponding topic when the second similarity is larger than or equal to a preset second threshold;
and the second matching unit is used for matching with the topics in the topic sequence according to the auxiliary information when the second similarity is smaller than a preset second threshold value.
Optionally, the second matching unit is specifically configured to match the topic in the topic sequence according to the type, time, and user attribute of the group chat message.
Optionally, the matching module 402 in the embodiment of the present disclosure may further include: and the new topic opening unit is used for judging whether the group chat message is an initial message, opening a new topic when the group chat message is the initial message, and adding the group chat message into the topic of the previous message when the group chat message is not the initial message.
Optionally, the group chat message comprises one or more of text, a picture, a video, a link.
Based on the group chat topic detection device provided by the embodiment of the disclosure, the topic sequence is constructed to de-entangle the cross and parallel topics to a certain extent, and the influence of the short text characteristic sparseness on the clustering effect is reduced by using the auxiliary information of users, time, types and the like of the group chat messages in the clustering process.
It should be noted that, when the multi-policy-based group chat topic detection apparatus provided in the foregoing embodiment executes the multi-policy-based group chat topic detection method, only the division of the above functional modules is illustrated, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules, so as to complete all or part of the above described functions. In addition, the multi-policy-based group chat topic detection device provided in the above embodiment and the multi-policy-based group chat topic detection method embodiment belong to the same concept, and details of the implementation process are shown in the method embodiment and are not described herein again.
In a third aspect, an embodiment of the present disclosure further provides an electronic device corresponding to the multi-policy based group chat topic detection method provided in the foregoing embodiment, so as to execute the multi-policy based group chat topic detection method.
Please refer to fig. 5, which illustrates a schematic diagram of an electronic device according to some embodiments of the present application. As shown in fig. 5, the electronic apparatus includes: the processor 500, the memory 501, the bus 502 and the communication interface 503, wherein the processor 500, the communication interface 503 and the memory 501 are connected through the bus 502; the memory 501 stores a computer program that can be executed on the processor 500, and the processor 500 executes the computer program to execute the multi-policy-based group chat topic detection method provided by any of the foregoing embodiments of the present application.
The Memory 501 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 503 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
Bus 502 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 501 is used for storing a program, and the processor 500 executes the program after receiving an execution instruction, and the method for detecting group chat topics based on multiple policies, disclosed in any of the embodiments of the present application, may be applied to the processor 500, or implemented by the processor 500.
The processor 500 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 500. The Processor 500 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 501, and the processor 500 reads the information in the memory 501, and completes the steps of the method in combination with the hardware thereof.
The electronic device provided by the embodiment of the application and the multi-strategy-based group chat topic detection method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.
In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium corresponding to the multi-policy based group chat topic detection method provided in the foregoing embodiment, please refer to fig. 6, which illustrates the computer-readable storage medium as an optical disc 600 on which a computer program (i.e., a program product) is stored, where the computer program, when executed by a processor, executes the multi-policy based group chat topic detection method provided in any of the foregoing embodiments.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.
The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the multi-policy-based group chat topic detection method provided by the embodiment of the present application have the same inventive concept and have the same beneficial effects as methods adopted, run, or implemented by application programs stored in the computer-readable storage medium.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A group chat topic detection method based on multiple strategies is characterized by comprising the following steps:
obtaining a topic sequence according to attribute information of the topic, wherein the topic sequence comprises a current topic and an expired topic;
matching the current topic according to the text characteristic information and the auxiliary information of the group chat message, and adding the group chat message into the corresponding topic;
calculating first similarity among the topics, and merging the topics when the first similarity is larger than or equal to a preset first threshold.
2. The method of claim 1, wherein the attribute information of the topic comprises a duration of the topic, a heat of the topic, and a message frequency within a preset time period.
3. The method of claim 1, wherein the current topics include general topics and hot topics.
4. The method of claim 1, wherein matching the text characteristic information and the supplementary information of the group chat message with the current topic comprises:
calculating a second similarity of the text characteristic information of the group chat message and the current topic;
when the second similarity is larger than or equal to a preset second threshold value, determining that the group chat message is matched with the corresponding topic;
and when the second similarity is smaller than a preset second threshold value, matching with the current topic according to the auxiliary information.
5. The method of claim 4, wherein matching with the current topic according to the auxiliary information comprises:
and matching with the current topic according to the type, time and user attribute of the group chat message.
6. The method of claim 1, wherein when no group chat message matches each of the current topics, further comprising:
judging whether the group chat message is an initial message;
when the group chat message is an initial message, starting a new topic;
and when the group chat message is not the initial message, adding the group chat message to the topic of the last message.
7. The method of any one of claims 1-6, wherein the group chat message comprises one or more of text, a picture, a video, a link.
8. A multi-strategy-based group chat topic detection device is characterized by comprising:
the topic sequence module is used for obtaining a topic sequence according to the attribute information of the topic, wherein the topic sequence comprises a current topic and an expired topic;
the matching module is used for matching the current topic according to the text characteristic information and the auxiliary information of the group chat message and adding the group chat message into the corresponding topic;
and the similarity calculation module is used for calculating first similarity among the topics, and merging the topics when the first similarity is greater than or equal to a preset first threshold value.
9. A multi-policy based group chat topic detection apparatus comprising a processor and a memory having stored thereon program instructions, wherein the processor is configured to perform the multi-policy based group chat topic detection method of any of claims 1 to 7 when executing the program instructions.
10. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement a multi-policy based group chat topic detection method as claimed in any one of claims 1 to 7.
CN202010808203.9A 2020-08-12 2020-08-12 Multi-strategy-based group chat topic detection method, device, equipment and storage medium Pending CN112131377A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010808203.9A CN112131377A (en) 2020-08-12 2020-08-12 Multi-strategy-based group chat topic detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010808203.9A CN112131377A (en) 2020-08-12 2020-08-12 Multi-strategy-based group chat topic detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112131377A true CN112131377A (en) 2020-12-25

Family

ID=73851810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010808203.9A Pending CN112131377A (en) 2020-08-12 2020-08-12 Multi-strategy-based group chat topic detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112131377A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113595886A (en) * 2021-07-29 2021-11-02 北京达佳互联信息技术有限公司 Instant messaging message processing method and device, electronic equipment and storage medium
US20240089227A1 (en) * 2022-09-08 2024-03-14 Integral Ad Science, Inc. Methods, systems, and media for providing automated review of incoming messages in a group messaging service

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105407037A (en) * 2015-10-30 2016-03-16 努比亚技术有限公司 Group chat device and method
CN110149265A (en) * 2018-03-14 2019-08-20 腾讯科技(深圳)有限公司 Message shows method, apparatus and computer equipment
CN110413770A (en) * 2019-06-12 2019-11-05 阿里巴巴集团控股有限公司 Group's message is referred to the method and device of group topic
CN111026835A (en) * 2019-12-26 2020-04-17 厦门市美亚柏科信息股份有限公司 Chat subject detection method, device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105407037A (en) * 2015-10-30 2016-03-16 努比亚技术有限公司 Group chat device and method
CN110149265A (en) * 2018-03-14 2019-08-20 腾讯科技(深圳)有限公司 Message shows method, apparatus and computer equipment
CN110413770A (en) * 2019-06-12 2019-11-05 阿里巴巴集团控股有限公司 Group's message is referred to the method and device of group topic
CN111026835A (en) * 2019-12-26 2020-04-17 厦门市美亚柏科信息股份有限公司 Chat subject detection method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张馨雨: "群聊话题检测技术研究", 《中国硕士学位论文全文数据库》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113595886A (en) * 2021-07-29 2021-11-02 北京达佳互联信息技术有限公司 Instant messaging message processing method and device, electronic equipment and storage medium
US20240089227A1 (en) * 2022-09-08 2024-03-14 Integral Ad Science, Inc. Methods, systems, and media for providing automated review of incoming messages in a group messaging service

Similar Documents

Publication Publication Date Title
US20180322115A1 (en) Method and device for determining comment, server and storage medium
US9218101B2 (en) Displaying estimated social interest in time-based media
CN111460153B (en) Hot topic extraction method, device, terminal equipment and storage medium
EP3401802A1 (en) Webpage training method and device, and search intention identification method and device
US11514063B2 (en) Method and apparatus of recommending information based on fused relationship network, and device and medium
WO2008037207A1 (en) Method and device for filtering junk information based on network
US11010687B2 (en) Detecting abusive language using character N-gram features
CN111683274B (en) Bullet screen advertisement display method, device and equipment and computer readable storage medium
KR101652358B1 (en) Evaluation information generation method and system, and computer storage medium
CN112131377A (en) Multi-strategy-based group chat topic detection method, device, equipment and storage medium
CN103324745A (en) Text garbage identifying method and system based on Bayesian model
CN112434510B (en) Information processing method, device, electronic equipment and storage medium
CN107924398B (en) System and method for providing a review-centric news reader
US10853417B2 (en) Generating a platform-based representative image for a digital video
US9268861B2 (en) Method and system for recommending relevant web content to second screen application users
CN108170845B (en) Multimedia data processing method, device and storage medium
US8874666B2 (en) Publisher-assisted, broker-based caching in a publish-subscription environment
CN109062905B (en) Barrage text value evaluation method, device, equipment and medium
CN110929683A (en) Video public opinion monitoring method and system based on artificial intelligence
CN110413770B (en) Method and device for classifying group messages into group topics
CN114943208A (en) Event information processing method and system, equipment and storage medium
CN110597980B (en) Data processing method and device and computer readable storage medium
CN114048742A (en) Knowledge entity and relation extraction method of text information and text quality evaluation method
CN115526176A (en) Text recognition method and device, electronic equipment and storage medium
CN110717011B (en) Session message processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201225

RJ01 Rejection of invention patent application after publication