CN102315952A

CN102315952A - Method and device for detecting junk posts in community network

Info

Publication number: CN102315952A
Application number: CN2010102141862A
Authority: CN
Inventors: 舒迅; 帅帅; 尹佳; 袁聃; 方勇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2010-06-29
Filing date: 2010-06-29
Publication date: 2012-01-11

Abstract

The invention provides a method and device for detecting junk posts in a community network to judge whether posts sent by a posting user are junk posts according to the posting behavioral characteristics of the posting user. In a preferable embodiment, the method comprises the following steps: firstly acquiring the posting behavioral characteristics of the posting user of the posts; and then judging whether the posts are junk posts based on the preset rule according to the posting behavioral characteristics of the posting user. In the other preferable embodiment, the method comprises the following steps: firstly acquiring the key content information in the posts; then acquiring the posting behavioral characteristics related to the posts, of the posting user; and judging whether the posts are the junk posts according to the posting behavioral characteristics related to the posts, of the posting user. Compared with the prior art, in the method provided by the invention, the posting behavioral characteristics of the posting user in the community network are detected and the external characteristics and semantic analysis of the posts are combined preferably, thus the junk posts can be judged more accurately.

Description

Method and equipment for detecting junk posts in community network

Technical Field

The invention relates to the technical field of computer networks, in particular to a method and equipment for detecting junk posts in a community network based on a computer network

Background

The community network service is an online community established by a network service provider, and is generally based on the internet, and provides various network-based interactive services for a group of users with the same interests and activities, including but not limited to e-mail, instant messaging chat, video and audio, dynamic information sharing of files, blogs, microblogs, posts, discussion groups, and the like.

Users may interact in a community network, for example, a user posts in a particular block related to a certain subject to other users accessing the particular block, and other users may browse the posts while accessing the particular block of the community network and post opinions or comments under the posts.

Because the types of users accessing the community network are mixed, the posts sent by the users may contain illegal and illegal contents, or the posts used for promotion and advertisement affect other people to browse normal posts, in the prior art, the community network website generally performs verification on the contents of the posts sent by the users through a manual or machine mode, mainly by checking whether the contents contained in the community network website contain specific words, such as illegal or illegal words and obvious advertisement contents.

However, this kind of verification method is difficult to be effective for some posts that do not obviously contain the above specific words, which will be referred to as "hidden junk posts" for example, the posts do not contain obvious advertisement content, and only introduce a certain product in a soft text manner; or, the post does not contain illegal or unscrupulous words, but is repeated in a large number in different blocks of the community network, so that the normal browsing of the user is influenced, and even the processing capacity of the website server is occupied.

Therefore, there is a need to provide a technical solution to identify such covert spam posts.

Disclosure of Invention

The present invention is directed to overcoming the above-mentioned drawbacks of the prior art, and providing a method and apparatus for detecting spam posts in a social network.

According to a first aspect of the present invention, there is provided a method for detecting spam posts in a community network, comprising:

a. and judging whether the posts sent by the posting user are junk posts or not according to the posting behavior characteristics of the posting user.

In a preferred embodiment, the step a includes:

a1. acquiring the posting behavior characteristics of the posting user of the post;

a2. and judging whether the post is a junk post according to the posting behavior characteristics of the posting user based on a first preset rule.

In another preferred embodiment, the step a includes:

a 1'. obtaining content key information in the post;

a 2', acquiring the posting behavior characteristics of the posting user related to the post according to the content key information;

a 3', judging whether the post is junk post according to the posting behavior characteristics of the posting user related to the post.

According to a second aspect of the present invention, there is provided an apparatus for detecting spam posts in a community network, comprising:

and the post detection device is used for judging whether the posts sent by the posting user are junk posts according to the posting behavior characteristics of the posting user.

In another preferred embodiment, the post detection apparatus includes:

first obtaining means for obtaining a posting behavior feature of a posting user of the post;

and the judging device is used for judging whether the post is a junk post according to the posting behavior characteristics of the posting user based on a first preset rule.

In another preferred embodiment, the post detection apparatus includes:

extracting means for acquiring content key information in the post;

first obtaining means for obtaining a posting behavioral characteristic of the posting user related to the post according to the content key information;

and judging whether the post is a junk post according to the posting behavior characteristics of the posting user related to the post.

Compared with the prior art, the method and the device can more accurately judge the junk posts by detecting the posting behavior characteristics of the posts in the community network and preferably combining the external characteristics of the posts and semantic analysis.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:

FIG. 1 is a network topology diagram of a community network according to the present invention;

FIG. 2 is a flow diagram of a method for detecting spam posts in a social network, in accordance with an aspect of the present invention;

FIG. 3 is a flowchart of a method for detecting spam posts in a social network in accordance with a preferred embodiment of the present invention;

FIG. 4 is a block diagram of an apparatus for detecting spam posts in a social network, in accordance with an aspect of the present invention;

FIG. 5 is a block diagram of an apparatus for detecting spam posts in a social network, in accordance with a preferred embodiment of the present invention.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present invention is described in further detail below with reference to the attached drawing figures.

Fig. 1 shows a topology diagram of a community network according to the present invention, which includes a network device and a plurality of users a-f, each of which accesses a community network service website (SNS) via a network through a respective user terminal, which includes one or more network devices for providing the community network service, including, but not limited to, a network server, a network host, or other user devices in a cloud computing mode, etc. The user terminal includes, but is not limited to, any device with internet browsing function, such as a computer, a smart phone, a PDA, a game machine, or an IPTV. The device for detecting junk posts according to the present invention may be a stand-alone device communicatively connected to a network device via a network, including but not limited to a general computer, a server, a host computer, etc.; or may be integral with the network device, and for simplicity, will be referred to collectively below as the network device.

Furthermore, the communication between the user terminal and the network device may be based on packet data transmission such as TCP/IP protocol, UDP protocol, etc. The communication between the network device 2 and the device for detecting posts may be packet data transmission based on the above-mentioned TCP/IP protocol, UDP protocol, or the like, or may be signal transmission based on various computer bus protocols inside the network device. It will be appreciated by those skilled in the art that the present invention is not limited to the above-described communications transmission protocols and that any external communications protocols or internal computer bus protocols that are or may later become known are suitable for use with the present invention and are hereby incorporated by reference.

When one of the users, for example, user a, accesses the community network, an interaction request is sent through its user terminal 1, for example, posting is made in a specific block of the community network, and after the post sent by the user a is approved by the network device 2, the post is stored and provided to the user accessing the specific block of the community network for display.

It will be appreciated by those skilled in the art that the community network of the present invention is not limited to the above-described form and may include other forms of interaction based on direct connections between user terminals such as the form of P2P.

The technical solution for identifying spam posts according to the present invention is described in detail below with reference to fig. 2-5.

FIG. 2 is a flow diagram of a method for detecting spam posts in a social network in accordance with an aspect of the present invention. For simplicity, only one candidate user and its user terminal are shown in fig. 2.

As shown in fig. 2, in step S1, when the user a accesses the community network website via the user terminal 1 and logs in a specific plate (hereinafter referred to as "post"), for example, a "military forum" post, the user terminal 1 is used to send a post to the network device by means of human-computer interaction. Although the present invention is described by taking "network device" as an example, those skilled in the art should understand that the present invention is also applicable to a P2P mode or a cloud computing mode-based mode in which user terminals directly interconnect with a community network, wherein each or some specific user terminals can function as a network device to detect posts made by users, and the present invention is also included in the scope of the present invention.

Specifically, the user a may access a web page of the community network through a browser such as IE, Firefox, or the like, and may access a web page of a "military forum" cafe that enters the community network through client software installed in the user terminal 1, such as QQ, or the like. In the former case, the user a may input the corresponding post content in the post input field on the "military forum" post bar web page of the community network, and then click a specific function button on the web page, so that the user terminal 1 transmits the post; in the latter case, the user a may input the post content in the software user interface of the client and cause the user terminal 1 to transmit the post by clicking a specific function button in the software user interface of the client. It will be appreciated by those skilled in the art that the present invention should not be limited to the above-described manner, and any manner of accessing a community network and posting that is applicable to the present invention should be within the scope of the present invention and is incorporated herein by reference.

Subsequently, in step S2, the network device 2 detects a post posted by the posting user based on the posting behavior characteristics of the posting user, and determines whether the post is a junk post.

In particular, the present invention recognizes that for many hidden spam posts, although they do not contain spam that is clearly illicit or unscrupulous or advertising in character, their poster may post a large number of posts in one or more blocks (e.g., multiple posts, hereinafter referred to as "posts" for uniformity) of a community network for their illicit posting purposes, even using machine posting approaches. Therefore, the recessive junk posts can be accurately detected by using the posting behavior characteristics such as the posting frequency of the posters or the number of posted bars.

Specifically, in step S21, after receiving the post sent by the posting user (hereinafter referred to as "poster"), the network device 2 extracts identification information of the poster, such as ID of login community network or IP address thereof, and then queries the posting behavioral characteristics of the posting user related to the post based on the identification information of the poster, including but not limited to posting frequency of the poster and number of posts in a post bar.

In step S21, the network device 2 may acquire the posting behavior characteristics of the poster in various ways, including but not limited to the following ways: 1) the network device 2 sends a request message for requesting to acquire cookie information to a user terminal of a poster, and acquires a recent posting history record of the poster according to the cookie information provided by the corresponding request message of the user terminal, so as to acquire posting behavior characteristics of the poster; 2) the network device 2 queries the posting behavior characteristics of the poster in the whole community network or the local community network and other community networks according to the identification information of the poster; 3) more preferably, the network device may establish and manage a posting behavioral characteristic library containing posting behavioral characteristics of a large number of posters, query the posting behavioral characteristics of the posters, and establish or update the posting behavioral characteristics in the posting news characterization library for the posters according to the posting behaviors of the posters, wherein the posting behavioral characteristic library includes various types of databases, which may be included in the network device in hardware or may be independent of the network device and establish communication connection with the network device through a network link. It should be understood by those skilled in the art that the present invention is not limited to the above-mentioned several manners for obtaining the posting behavior characteristics, and any other manners for obtaining the posting behavior characteristics applicable to the present invention are also included in the scope of the present invention and are included herein by reference.

Subsequently, in step S22, the network device 2 will determine the obtained posting behavior characteristics of the user based on a first predetermined rule, which includes but is not limited to: 1) comparing the posting frequency of the poster with a first predetermined threshold, and/or when the posting frequency of the poster is above the first predetermined threshold; 2) and comparing the number of posts of the poster with a second preset threshold, and judging the posts sent by the poster are junk posts when the number of posts of the poster exceeds the second preset threshold. In fact, if the frequency of posting by the poster is significantly higher than the frequency of manual posting, e.g., 15-20 times/min, the posting may be determined to be machine posting, and the posting by the poster may be determined to be junk posting. It should be understood by those skilled in the art that the present invention is not limited to the above-mentioned several posting behavior characteristics, and other posting behavior characteristics applicable to the present invention for determining the abnormal posting behavior of the poster are also included in the scope of the present invention.

Finally, in step S3, the network device 2 will process the post according to the determination result in step S22. Specifically, when the post is judged to be not a junk post, the post can be directly released to be displayed on a corresponding bar; when the post is judged to be junk or suspected junk, the processing method includes, but is not limited to: 1) informing website managers to perform manual review and manual processing on suspected junk posts; 2) setting a higher threshold value for the posting behavior characteristics corresponding to the junk posts, and directly deleting posts if the posting behavior characteristics of the posters exceed the threshold value; 2) more preferably, a plurality of spam levels and a multi-level threshold corresponding to each spam level are set for spam posts, and when the posting behavior characteristics of a poster are judged to exceed a first-level threshold by taking three spam levels as an example, a warning notification message is sent to the posts; when the feature of the posting behavior of the poster exceeds a second-level threshold value, deleting the posting directly; and when the posting behavior characteristic of the poster exceeds a third-level threshold value, not only deleting the posting but also checking the ID or IP address of the posting user. It should be understood by those skilled in the art that the present invention is not limited to the above-mentioned post processing methods, and any other post processing methods applicable to the present invention are also included in the scope of the present invention and are included herein by reference.

Preferably, in step S4 (not shown), the network device 2 further establishes or updates its posting behavioral characteristics in the posting behavioral characteristics library according to the posting behavior of the poster this time.

Preferably, in step S5 (not shown), the network device 2 further adjusts the threshold according to the determination result in step S2 and the feedback of the community website administrator, including but not limited to: 1) when the judging device judges that the junk posts exist but the condition of non-junk posts is confirmed to exceed the preset number after the audit of the community website administrator, increasing the corresponding threshold value according to the preset amplification; 2) and when the occurrence judgment device judges that the junk posts are not junk posts but the situation that the junk posts are confirmed to exceed the preset number after the audit of the community website administrator, reducing the corresponding threshold value according to the preset amplification.

It should be noted that fig. 2 only schematically illustrates a situation where one user posts to the community network through the user terminal, and actually, there may be a situation where multiple users post to the community network simultaneously in the community network, and the detection and processing principles of the network device for each user post are the same.

FIG. 3 is a flowchart illustrating a method for detecting spam in a social network according to a preferred embodiment of the present invention, wherein step S1' is the same as step S1 in FIG. 2, and is included herein for brevity and brevity, and thus will not be described in detail herein.

As shown in fig. 3, in step S21', the network device 2 extracts the key information of the content of the posted person and the identification information of the posted person, such as the ID of the login community network or the IP address thereof, and then queries the posting behavioral characteristics of the posted person related to the posted person according to the key information of the content, that is, queries other posts (hereinafter referred to as "similar posts") having the same or similar key information as the posted person in the posts posted by the posted person using the extracted content key information, and then obtains the posting behavioral characteristics of the posted person posting similar posts, so as to more accurately determine whether the posts are spam posts, wherein the posting behavioral characteristics include, but are not limited to, frequency of posting similar posts and number of posts posting similar posts.

Specifically, the network device 2 may obtain the posting behavioral characteristics of the poster in various ways, including but not limited to the following: 1) the network device 2 sends a request message for requesting to acquire cookie information to a user terminal of a poster, and acquires a recent posting history record of the poster according to the cookie information provided by the user terminal corresponding request message, so as to acquire posting behavior characteristics of similar posts issued by the poster; 2) the network device 2 queries the posting behavior characteristics of the posters related to the posters in the whole community network or in the local community network and other community networks according to the content key information of the posters and the identification information of the posters; 3) more preferably, the network device may establish and manage a posting behavioral characteristic library containing posting behavioral characteristics of a large number of posters, query the posting behavioral characteristics of the posters related to the posts from the posting behavioral characteristic library, and establish or update the posting behavioral characteristics related to the posts in the posting news characterization library according to the posting behaviors of the posters, wherein the posting behavioral characteristic library includes various types of databases, which may be included in the network device in hardware or may be independent of the network device and establish a communication connection therewith through a network link. It should be understood by those skilled in the art that the present invention is not limited to the above-mentioned several manners for obtaining the posting behavior characteristics, and any other manners for obtaining the posting behavior characteristics applicable to the present invention are also included in the scope of the present invention and are included herein by reference.

Subsequently, in step S22', the network device 2 determines the acquired posting behavior feature of the poster regarding the post according to a first predetermined rule. Specifically, the first predetermined rule includes, but is not limited to: 1) comparing the posting frequency of the poster associated with the post to a first predetermined threshold, when above the first predetermined threshold; and/or 2) comparing the number of posts issued by the poster with a second preset threshold, and judging the posts issued by the poster to be junk posts when the number exceeds the second preset threshold. In fact, if the posting frequency of posting people to post posts is significantly higher than the manual posting frequency, e.g., 15-20 times/minute, then machine posting may be determined and the posts may be determined as junk posts. It should be understood by those skilled in the art that the first predetermined rule of the present invention is not limited to the above-mentioned modes, and other determination rules for determining the abnormal posting behavior of the poster, which can be applied to the present invention, should also be included in the scope of the present invention.

Finally, in step S3 ', the network device 2 will process the post according to the determination result in step S22'. Specifically, when the post is judged to be not a junk post, the post can be directly released to be displayed on a corresponding bar; when the post is judged to be junk or suspected junk, the processing method includes, but is not limited to: 1) informing website management personnel to perform manual review and manual processing on the suspected rubbish posts; 2) setting a higher threshold value for the posting behavior characteristics corresponding to the junk posts, and directly deleting posts if the posting behavior characteristics of the posters related to the posts exceed the higher threshold value; 3) preferably, a plurality of spam levels and a multi-level threshold corresponding to each spam level are set for spam posts, and when the posting behavior characteristics of a poster related to the posts are judged to exceed a first level threshold by taking three spam levels as an example, a warning notification message is sent to the posts; when the posting behavior characteristics of the posters and the posts exceed a second-level threshold value, deleting the posts directly; when the posting behavior characteristic of the poster related to the post is judged to exceed the third-level threshold, not only the posting is deleted, but also the ID or the IP address of the posting user is checked. It should be understood by those skilled in the art that the present invention is not limited to the above-mentioned post processing methods, and any other post processing methods applicable to the present invention are also included in the scope of the present invention and are included herein by reference.

In practice, some spam or cryptic spam may be issued by a group of users rather than a single user, and thus, in order to more accurately detect spam, it is necessary to make a comprehensive judgment in conjunction with the external characteristics of the post.

Specifically, in another preferred embodiment, in step S23', after receiving a post, the network device 2 queries the external features of the post based on not only the posting behavioral features of the poster or the posting behavioral features related to the post, but also the content key information of the obtained post. The external features include, but are not limited to, any of the following: 1) the content repetition degree of the post or the content repetition degree of the post and other similar posts; 2) a number of community networks in which other posts having the same or similar content as the post are located. Subsequently, the network device may determine whether the post is a junk post according to the external feature and in combination with the posting behavior feature of the poster based on a second predetermined rule.

Specifically, the second predetermined rule includes, but is not limited to: 1) when the content repetition degree of the post or the content repetition degree of the post and other similar posts exceeds a third preset threshold value; and/or 2) whether the number of community networks in which other posts (hereinafter referred to as "similar posts") having the same or similar content as the post are located exceeds a fourth predetermined threshold; and/or 3) whether the posting frequency of other similar posts exceeds a fifth predetermined threshold. It should be understood by those skilled in the art that the second predetermined rule of the present invention is not limited to the above-mentioned several ways, and other determination rules for determining the abnormality of the external feature of the post, which can be applied to the present invention, should also be included in the scope of the present invention.

In addition, the present invention may adopt various comprehensive judgment methods for more accurately and comprehensively judging junk posts based on the external features of posts and the post-posting behavior features, including but not limited to: 1) performing logical and operation on the judgment result of the external feature of the post and the judgment result of the post posting behavioral feature of the poster described above with reference to fig. 2 or 3, that is, only when the judgment according to the external feature of the post and the judgment according to the post behavioral feature of the poster are both junk posts, the post is finally judged to be a junk post; 2) the external feature of the post can be normalized, the normalized value is used as a weight factor to be multiplied by the post posting behavioral feature of the post, and whether the post is a junk post or not is judged based on the weighted post posting behavioral feature of the post. It should be understood by those skilled in the art that the method for comprehensively judging the post external feature based on the post external feature and the post posting behavior feature of the poster in the present invention is not limited to the above-mentioned several methods, and other comprehensive judgment methods for comprehensively judging the post external feature based on the post external feature and the posting behavior feature of the poster, which can be applied to the present invention, are also included in the scope of the present invention.

In addition, the invention can also be applied to the detection of junk posts of junk contents or suspected junk contents containing more obvious but not serious illegal or unscrupulous or advertising characteristics, and can also more accurately classify the posts by combining the detection of the posting behavior characteristics of the posting users, for example, posts drawn by a small amount of pornographic contents may belong to normal literary creation, but can be judged as the junk posts when the network equipment detects that the posting behavior characteristics of the posting users are abnormal.

Specifically, in another preferred embodiment, in step S24' (not shown), after receiving a post, the network device 2 determines whether the post content contains spam or suspected spam based on predetermined semantic rules. Wherein the predetermined semantic rule includes, but is not limited to, at least any one of the following: 1) whether the post content satisfies a grammatical rule; 2) whether the post content contains junk words or not; 3) whether the post content contains address information or not is determined, wherein the address information comprises: a web address link, a telephone number, or a QQ number, etc.

Subsequently, when it is detected that the post contains spam or suspected spam, in the entire community network, or in the local community network or other community networks, based on the spam or suspected spam and the identification information of the poster, in step S21', the posting behavioral characteristics of the poster related to the spam or suspected spam are queried, and the posting behavioral characteristics of the poster related to the spam or suspected spam are determined based on the first predetermined rule described above with reference to fig. 3, so as to finally determine whether the post is spam. Wherein,

likewise, in the another preferred embodiment, the network device 2 may further combine the external features of the post to make a comprehensive judgment on the post. Specifically, after receiving a post, the network device 2 not only determines whether the post contains spam or suspected spam content based on a predetermined semantic rule, but also detects a posting behavioral characteristic of a poster regarding the spam or suspected spam content based on the spam or suspected spam content and identification information of the poster when it is detected that the post contains the spam or suspected spam content. The network device may further query the external features of the post based on spam or suspected spam in the retrieved post. The external features include, but are not limited to, any of the following: 1) the repeatability of the junk content or suspected junk content in the posts, and the repeatability of the junk content or suspected junk content of the posts or other similar posts in the whole community network and/or a plurality of community networks; 2) the number of community networks in which other posts having the same or similar spam or suspected spam content as the post are located. Subsequently, the network device may determine whether the post is a junk post more accurately based on the junk post determination process based on the predetermined semantic rule and the poster identification information and by combining the second predetermined rule described above with reference to fig. 3 to determine the external feature of the post, and for simplicity, the specific content is included herein by reference, which is not repeated herein.

Likewise, the present invention may employ a comprehensive judgment method combining the above junk post judgment process based on the predetermined semantic rules and the poster identification information with the judgment process based on the external features of the posts, which includes but is not limited to: 1) performing logic and operation on the judgment result of the external feature of the post and the judgment result based on the preset semantic rule and the identification information of the poster, namely judging the post as a junk post only when the post is judged according to the external feature of the post, whether the post has junk content or suspected junk content according to the preset semantic rule and judging the post to be the junk post by combining the posting behavior feature of the poster, which is related to the junk content or the suspected junk content; 2) the post external feature can be normalized, the normalized value is used as a weight factor to be multiplied by the posting behavioral features of the posters related to the junk content or the suspected junk content, and whether the post is the junk post or not is judged based on the weighted posting behavioral features of the posters. It should be understood by those skilled in the art that the comprehensive judgment method of the present invention, which combines the junk post judgment process based on the predefined semantic rules and the post identification information with the judgment process based on the post external features, is not limited to the above-mentioned methods, and other comprehensive judgment methods applicable to the present invention for the junk post judgment process based on the predefined semantic rules and the post identification information and the judgment process based on the post external features are also included in the scope of the present invention.

Preferably, in step S4' (not shown), the network device 2 further establishes or updates its posting behavioral characteristics related to the post in the posting behavioral characteristics library according to the posting behavior of the poster this time.

Preferably, in step S5 '(not shown), the network device 2 further adjusts the above threshold according to the determination result in step S22' and the feedback of the community website administrator, including but not limited to: 1) when the judging device judges that the junk posts exist but the condition of non-junk posts is confirmed to exceed the preset number after the audit of the community website administrator, increasing the corresponding threshold value according to the preset amplification; 2) and when the occurrence judgment device judges that the junk posts are not junk posts but the situation that the junk posts are confirmed to exceed the preset number after the audit of the community website administrator, reducing the corresponding threshold value according to the preset amplification.

FIG. 4 illustrates a system diagram for detecting spam posts in a social network, in accordance with an aspect of the subject invention. For simplicity, only one candidate user and its user terminal 1, and network device 2 are shown in fig. 4. The network device 2 includes, but is not limited to, a network server, a network host, or other user devices in a cloud computing mode. The user terminal includes, but is not limited to, any device with internet browsing function, such as a computer, a smart phone, a PDA, a game machine, or an IPTV. As shown in fig. 4, the network device 2 includes a post detection apparatus 20 for detecting spam posts, but it should be understood by those skilled in the art that the post detection apparatus 20 may be a stand-alone device communicatively connected to the network device via a network, including but not limited to a general computer, a server, a host computer, etc.

Wherein the communication between the user terminal and the network device may be based on packet data transmission such as TCP/IP protocol, UDP protocol, etc. When the post detection device is an independent device, the communication between the post detection device and the network device 2 can also be packet data transmission based on the TCP/IP protocol, the UDP protocol and the like; when the post detection means 20 is included in the network device 2, its communication with other modules of the network device is based on signal transmission of various computer bus protocols. It will be appreciated by those skilled in the art that the present invention is not limited to the above-described communications transmission protocols and that any external communications protocols or internal computer bus protocols that are or may later become known are suitable for use with the present invention and are hereby incorporated by reference.

Hereinafter, the present invention will be described in detail by taking only an example in which the post detection apparatus 20 is included in the network device 2.

As shown in fig. 4, when a user a accesses a community network site via a user terminal 1 and logs in a specific board (hereinafter, referred to as a "post") of the user a, for example, a "military forum" post, the post is sent to a network device 2 by using the user terminal 1 through a human-computer interaction. Although the present invention is described by taking "network device" as an example, those skilled in the art should understand that the present invention is also applicable to a P2P mode or a cloud computing mode-based mode in which user terminals directly interconnect with a community network, wherein each or some specific user terminals can function as a network device to detect posts made by users, and the present invention is also included in the scope of the present invention.

Subsequently, the post detection means 20 in the network device 2 detects a post sent by the posting user based on the posting behavior characteristics of the posting user, and determines whether the post is a junk post.

Specifically, after receiving a post sent by a user who posts (hereinafter referred to as "poster"), the first obtaining device 21 extracts identification information of the poster, such as ID of login community network or IP address thereof, and then queries a post behavior feature of the posting user related to the post based on the identification information of the poster, the post behavior feature including, but not limited to, posting frequency of the poster and number of posts in a post bar.

The first acquiring means 21 may acquire the posting behavior characteristics of the poster in various ways, including but not limited to the following ways: 1) sending a request message for requesting to acquire cookie information to a user terminal of a poster, and acquiring recent posting history records of the poster according to the cookie information provided by the corresponding request message of the user terminal so as to acquire posting behavior characteristics of the poster; 2) inquiring the posting behavior characteristics of the poster in the whole community network or the local community network and other community networks according to the identification information of the poster; 3) more preferably, the network device 2 may establish and manage a posting behavioral characteristic library containing posting behavioral characteristics of a large number of posters, and the first obtaining device 21 may query the posting behavioral characteristics of the posters in the posting behavioral characteristic library, where the posting behavioral characteristic library includes various types of databases, which may be included in the network device in hardware or may be independent of the network device and establish a communication connection therewith through a network link. It should be understood by those skilled in the art that the present invention is not limited to the above-mentioned several manners for obtaining the posting behavior characteristics, and any other manners for obtaining the posting behavior characteristics applicable to the present invention are also included in the scope of the present invention and are included herein by reference.

Subsequently, the determining means 22 will determine the obtained posting behavior characteristics of the user based on a first predetermined rule, which includes but is not limited to: 1) comparing the posting frequency of the poster with a first predetermined threshold, and/or when the posting frequency of the poster is above the first predetermined threshold; 2) and comparing the number of posts of the poster with a second preset threshold, and judging the posts sent by the poster are junk posts when the number of posts of the poster exceeds the second preset threshold. In fact, if the frequency of posting by the poster is significantly higher than the frequency of manual posting, e.g., 15-20 times/min, the posting may be determined to be machine posting, and the posting by the poster may be determined to be junk posting. It should be understood by those skilled in the art that the present invention is not limited to the above-mentioned several posting behavior characteristics, and other posting behavior characteristics applicable to the present invention for determining the abnormal posting behavior of the poster are also included in the scope of the present invention.

Finally, the post processing means 23 processes the post according to the judgment result of the judging means 22. Specifically, when the judging means 22 judges that the post is not a junk post, it may be directly released for display on the corresponding bar; when the post is judged to be junk or suspected junk, the processing method includes, but is not limited to: 1) informing website managers to perform manual review and manual processing on suspected junk posts; 2) setting a higher threshold value for the posting behavior characteristics corresponding to the junk posts, and directly deleting posts if the posting behavior characteristics of the posters exceed the threshold value; 2) more preferably, a plurality of spam levels and a multi-level threshold corresponding to each spam level are set for spam posts, and when the posting behavior characteristics of a poster are judged to exceed a first-level threshold by taking three spam levels as an example, a warning notification message is sent to the posts; when the feature of the posting behavior of the poster exceeds a second-level threshold value, deleting the posting directly; and when the posting behavior characteristic of the poster exceeds a third-level threshold value, not only deleting the posting but also checking the ID or IP address of the posting user. It should be understood by those skilled in the art that the present invention is not limited to the above-mentioned post processing methods, and any other post processing methods applicable to the present invention are also included in the scope of the present invention and are included herein by reference.

Preferably, the network device 2 further comprises an adjusting device (not shown) for adjusting the threshold value according to the determination result and the feedback of the community website administrator, including but not limited to: 1) when the judging device judges that the junk posts exist but the condition of non-junk posts is confirmed to exceed the preset number after the audit of the community website administrator, increasing the corresponding threshold value according to the preset amplification; 2) and when the occurrence judgment device judges that the junk posts are not junk posts but the situation that the junk posts are confirmed to exceed the preset number after the audit of the community website administrator, reducing the corresponding threshold value according to the preset amplification.

Preferably, the network device 2 further comprises an updating device (not shown) for establishing or updating the posting behavior characteristics of the poster in the posting behavior characteristic library according to the posting behavior of the poster this time.

It should be noted that fig. 4 only schematically illustrates a case where one user posts to the community network through the user terminal, and actually, in the community network, there may be a case where multiple users post to the community network at the same time, and the detection and processing principle of the network device for each user post is the same.

FIG. 5 is a diagram illustrating a system for detecting spam posts in a social network according to a preferred embodiment of the present invention, wherein the posting process of the user is the same as the posting process described above with reference to FIG. 4, and is included herein by reference for brevity and brevity, and thus will not be described again.

As shown in fig. 5, after the network device 2 receives the posted posts, the first obtaining means 21' extracts the key information of the content and the identification information of the posting person, such as the ID of the login community network or the IP address thereof, and then queries the posting behavioral characteristics of the posting person related to the posts according to the key information of the content, that is, queries other posts (hereinafter referred to as "similar posts") having the same or similar key information as the posts in the posts issued by the posting person by using the extracted key information of the content, and then obtains the posting behavioral characteristics of the posting person issuing similar posts, so as to more accurately judge whether the posts are spam posts, wherein the posting behavioral characteristics include, but are not limited to, the posting frequency of the similar posts, and the number of posts issued by the similar posts.

Specifically, the first obtaining device 21' may obtain the posting behavior characteristics of the poster through various ways, including but not limited to the following ways: 1) sending a request message for requesting to acquire cookie information to a user terminal of a poster, and acquiring recent posting history records of the poster according to the cookie information provided by the corresponding request message of the user terminal so as to acquire posting behavior characteristics of similar posts issued by the poster; 2) inquiring posting behavior characteristics of the posters related to the posters in the whole community network or the local community network and other community networks according to the content key information of the posters and the identification information of the posters; 3) more preferably, the network device may establish and manage a posting behavioral characteristic library containing posting behavioral characteristics of a large number of posters, and the first obtaining means 21' may query the posting behavioral characteristic library for the posting person related to the post, and establish or update its posting behavioral characteristics related to the post in the posting news characterization library according to the posting behavior of the posting person, where the posting behavioral characteristic library includes various types of databases, which may be included in the network device in hardware or may be independent of the network device and establish a communication connection therewith through a network link. It should be understood by those skilled in the art that the present invention is not limited to the above-mentioned several manners for obtaining the posting behavior characteristics, and any other manners for obtaining the posting behavior characteristics applicable to the present invention are also included in the scope of the present invention and are included herein by reference.

Subsequently, the judging means 22' judges the acquired posting behavioral characteristics of the poster related to the post according to the first predetermined rule. Specifically, the first predetermined rule includes, but is not limited to: 1) comparing the posting frequency of the poster associated with the post to a first predetermined threshold, when above the first predetermined threshold; and/or 2) comparing the number of posts issued by the poster with a second preset threshold, and judging the posts issued by the poster to be junk posts when the number exceeds the second preset threshold. In fact, if the posting frequency of posting people to post posts is significantly higher than the manual posting frequency, e.g., 15-20 times/minute, then machine posting may be determined and the posts may be determined as junk posts. It should be understood by those skilled in the art that the first predetermined rule of the present invention is not limited to the above-mentioned modes, and other determination rules for determining the abnormal posting behavior of the poster, which can be applied to the present invention, should also be included in the scope of the present invention.

Finally, the post processing means 23 'processes the post according to the judgment result of the judging means 22'. Specifically, when the post is judged to be not a junk post, the post can be directly released to be displayed on a corresponding bar; when the post is judged to be junk or suspected junk, the processing method includes, but is not limited to: 1) informing website managers to perform manual review and manual processing on suspected junk posts; 2) setting a higher threshold value for the posting behavior characteristics corresponding to the junk posts, and directly deleting posts if the posting behavior characteristics of the posters related to the posts exceed the higher threshold value; 3) preferably, a plurality of spam levels and a multi-level threshold corresponding to each spam level are set for spam posts, and when the posting behavior characteristics of a poster related to the posts are judged to exceed a first level threshold by taking three spam levels as an example, a warning notification message is sent to the posts; when the posting behavior characteristics of the posters and the posts exceed a second-level threshold value, deleting the posts directly; when the posting behavior characteristic of the poster related to the post is judged to exceed the third-level threshold, not only the posting is deleted, but also the ID or the IP address of the posting user is checked. It should be understood by those skilled in the art that the present invention is not limited to the above-mentioned post processing methods, and any other post processing methods applicable to the present invention are also included in the scope of the present invention and are included herein by reference.

Specifically, in another preferred embodiment, after the network device 2 receives a post, the second obtaining means 24' not only queries the external feature of the post based on the posting behavior feature of the poster or the posting behavior feature related to the post, but also further queries the external feature of the post based on the content key information of the obtained post. The external features include, but are not limited to, any of the following: 1) the content repetition degree of the post or the content repetition degree of the post and other similar posts; 2) a number of community networks in which other posts having the same or similar content as the post are located.

Subsequently, the determining means 22' may determine whether the post is a junk post according to the external feature and the posting behavior feature of the poster based on a second predetermined rule.

In addition, the determining device 22' of the present invention may adopt various comprehensive determining methods for more accurately and comprehensively determining junk posts based on the external features of posts and the post behavior features of the poster, which include, but are not limited to: 1) performing logical and operation on the judgment result of the external feature of the post and the judgment result of the post posting behavioral feature of the poster described above with reference to fig. 2 or 3, that is, only when the judgment according to the external feature of the post and the judgment according to the post behavioral feature of the poster are both junk posts, the post is finally judged to be a junk post; 2) the external feature of the post can be normalized, the normalized value is used as a weight factor to be multiplied by the post posting behavioral feature of the post, and whether the post is a junk post or not is judged based on the weighted post posting behavioral feature of the post. It should be understood by those skilled in the art that the method for comprehensively judging the post external feature based on the post external feature and the post posting behavior feature of the poster in the present invention is not limited to the above-mentioned several methods, and other comprehensive judgment methods for comprehensively judging the post external feature based on the post external feature and the posting behavior feature of the poster, which can be applied to the present invention, are also included in the scope of the present invention.

Specifically, in another preferred embodiment, after the network device 22 receives a post, the determining means 22' determines whether the post content contains spam or suspected spam based on a predetermined semantic rule. Wherein the predetermined semantic rule includes, but is not limited to, at least any one of the following: 1) whether the post content satisfies a grammatical rule; 2) whether the post content contains junk words or not; 3) whether the post content contains address information or not is determined, wherein the address information comprises: a web address link, a telephone number, or a QQ number, etc.

Subsequently, when a semantic detection device (not shown) detects that the post contains spam or suspected spam, the first obtaining device 21' queries a posting behavioral characteristic of the poster about the spam or suspected spam in the entire community network, or in the local community network and other community networks, based on the spam or suspected spam and the identification information of the poster.

Subsequently, the judging means 22' judges the posting behavior characteristics of the poster about the junk content or the suspected junk content based on the first predetermined rule described above with reference to fig. 3, to finally judge whether the post is a junk post. Wherein,

likewise, in the another preferred embodiment, the network device 2 may further combine the external features of the post to make a comprehensive judgment on the post. Specifically, after the network device 2 receives a post, a semantic detection means (not shown) detects whether the post contains spam or suspected spam content based on a predetermined semantic rule, and when it is detected that the post contains spam or suspected spam content, the first obtaining means 21' detects a posting behavior feature of a poster related to the spam or suspected spam content based on the spam or suspected spam content and identification information of the poster; and the second capture device 24' queries the external features of the retrieved post for spam or suspected spam content in the post. The external features include, but are not limited to, any of the following: 1) the repeatability of the junk content or suspected junk content in the posts, and the repeatability of the junk content or suspected junk content of the posts or other similar posts in the whole community network and/or a plurality of community networks; 2) the number of community networks in which other posts having the same or similar spam or suspected spam content as the post are located.

Subsequently, the determining device 22' may determine whether the post is a junk post more accurately based on the junk post determining process based on the predetermined semantic rule and the poster identification information and by combining the second predetermined rule described above with reference to fig. 4 to determine the external features of the post, and for simplicity, the specific content thereof is included herein by reference, which is not repeated herein.

Similarly, the determination means 22' in the present invention may adopt a comprehensive determination method combining the above spam post determination process based on the predetermined semantic rules and the poster identification information with the determination process based on the external features of the posts, which includes but is not limited to: 1) performing logic and operation on the judgment result of the external feature of the post and the judgment result based on the preset semantic rule and the identification information of the poster, namely judging the post as a junk post only when the post is judged according to the external feature of the post, whether the post has junk content or suspected junk content according to the preset semantic rule and judging the post to be the junk post by combining the posting behavior feature of the poster, which is related to the junk content or the suspected junk content; 2) the post external feature can be normalized, the normalized value is used as a weight factor to be multiplied by the posting behavioral features of the posters related to the junk content or the suspected junk content, and whether the post is the junk post or not is judged based on the weighted posting behavioral features of the posters. It should be understood by those skilled in the art that the comprehensive judgment method of the present invention combining the junk post judgment process based on the predetermined semantic rule and the post identification information with the judgment process based on the post external feature is not limited to the above-mentioned methods, and other comprehensive judgment methods applicable to the present invention for the junk post judgment process based on the predetermined semantic rule and the post identification information and based on the post external feature are also included in the scope of the present invention.

Various specific embodiments of the present invention are described in detail above with reference to fig. 2-4. It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A method for detecting spam posts in a community network, comprising:

2. The method of claim 1, wherein the step a comprises:

3. The method of claim 1, wherein the step a further comprises:

a 1'. obtaining content key information in the post;

4. The method of claim 2 or 3, wherein the posting behavioral characteristics of the posting user include at least any one of:

-a posting frequency of a poster;

-information of the bar in which the poster is located;

wherein the first predetermined rule comprises any of:

-the frequency of the posting by the poster exceeds a first predetermined threshold;

-the number of community networks to which the poster is posted exceeds a second predetermined threshold.

5. The method of claim 4, further comprising:

-adjusting said first predetermined threshold or said second predetermined threshold accordingly, depending on said decision and by feedback from the community network administrator.

6. The method of any one of claims 3 to 5, wherein the step of obtaining the posting behavioral characteristics of the posting user further comprises:

-inquiring in a posting behavioral characteristic library according to the identification information of the posting user to obtain the posting behavioral characteristic of the posting user.

7. The method of claim 6, further comprising:

-updating the posting behavior characteristics of the posting user in the posting behavior characteristics library according to the judgment result.

8. The method of any of claims 2 to 7, further comprising:

c, acquiring external features of the post;

wherein, the step a further comprises:

-determining an external characteristic of the post based on a second predetermined rule and in combination with a posting behavior characteristic of the user whether the post is a junk post.

9. The method of claim 8, wherein the external features comprise at least any one of:

-content repetition of the post;

-a number of community networks in which other similar posts having the same or similar content as the post are located;

-frequency of issuance of other similar posts having the same or similar content as the post;

wherein the second predetermined rule comprises at least any one of:

-whether the content repetition of the post exceeds a third predetermined threshold;

-whether the number of community networks in which the other similar posts are located exceeds a fourth predetermined threshold;

-whether the frequency of posting of said other similar posts exceeds a fifth predetermined threshold.

10. The method according to any one of claims 2 to 9, wherein the step a further comprises:

-determining the content of the post based on predefined semantic rules and in combination with posting behavioral characteristics of the user whether the post is a junk post;

wherein the predetermined semantic rules include:

-whether the post content satisfies a grammatical rule;

-whether the post content contains junk words;

-whether address information is contained in the post content.

11. The method of claim 10, wherein the address information comprises: a web address link, a telephone number, or a QQ number.

12. The method of any of claims 1 to 11, wherein the method further comprises:

and when the posts are judged to be junk posts, processing the junk posts according to a preset processing rule.

13. An apparatus for detecting spam posts in a community network, comprising:

14. The apparatus of claim 13, wherein the post detection means comprises:

15. The apparatus of claim 13, wherein the post detection means comprises:

extracting means for acquiring content key information in the post;

16. The apparatus of claim 14 or 15, wherein the posting behavioral characteristics of the posting user include at least any one of:

a frequency of posting by a poster;

information of a bar to which the poster is posted;

wherein the first predetermined rule comprises any of:

17. The apparatus of claim 16, further comprising:

and the adjusting device is used for correspondingly adjusting the first preset threshold value or the second preset threshold value according to the judgment result and the feedback of the community network administrator.

18. The apparatus according to any one of claims 15 to 17, wherein the first obtaining means is further configured to query a posting behavioral characteristic library according to the identification information of the posting user to obtain the posting behavioral characteristic of the posting user.

19. The apparatus of claim 18, further comprising:

and the updating device is used for updating the posting behavior characteristics of the posting user in the posting behavior characteristic library according to the judgment result.

20. The apparatus of any of claims 14 to 19, further comprising:

second obtaining means for obtaining an external feature of the post;

wherein the judging device is further used for judging the external characteristics of the posts based on a second predetermined rule and judging whether the posts are junk posts or not by combining the posting behavior characteristics of the user.

21. The apparatus of claim 20, wherein the external features comprise at least any one of:

-content repetition of the post;

wherein the second predetermined rule comprises at least any one of:

22. The apparatus according to any one of claims 14 to 21, wherein the determining means is further configured to determine the content of the post based on a predetermined semantic rule and determine whether the post is a junk post in combination with a posting behavior feature of a user;

wherein the predetermined semantic rules include:

-whether the post content satisfies a grammatical rule;

-whether the post content contains junk words;

-whether address information is contained in the post content.

23. The device of claim 22, wherein the address information comprises: a web address link, a telephone number, or a QQ number.

24. The apparatus of any of claims 13 to 23, further comprising:

and the post processing device is used for processing the posts according to a preset processing rule when the posts are judged to be junk posts.