CN110891010B - Method and apparatus for transmitting information - Google Patents

Method and apparatus for transmitting information Download PDF

Info

Publication number
CN110891010B
CN110891010B CN201811030779.6A CN201811030779A CN110891010B CN 110891010 B CN110891010 B CN 110891010B CN 201811030779 A CN201811030779 A CN 201811030779A CN 110891010 B CN110891010 B CN 110891010B
Authority
CN
China
Prior art keywords
text
tested
texts
preset threshold
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811030779.6A
Other languages
Chinese (zh)
Other versions
CN110891010A (en
Inventor
黄珊
刘俐岑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811030779.6A priority Critical patent/CN110891010B/en
Publication of CN110891010A publication Critical patent/CN110891010A/en
Application granted granted Critical
Publication of CN110891010B publication Critical patent/CN110891010B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/06Message adaptation to terminal or network requirements
    • H04L51/066Format adaptation, e.g. format conversion or compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a method and a device for sending information. One embodiment of the method comprises: acquiring a text set to be tested, wherein the text set to be tested comprises at least two texts to be tested; selecting a text to be tested from the text set to be tested, and executing the following grouping steps: determining whether other texts to be tested with the text association degree of the selected text to be tested exceeding a preset threshold exist in the text set to be tested; in response to determining that the text group exists, generating a text group based on the selected text to be tested and other text to be tested of which the text association degree with the selected text to be tested exceeds a preset threshold; determining the number of texts to be tested in the text group; determining whether the number of texts to be tested in the text group is greater than or equal to a preset threshold value; and sending prompt information in response to determining that the threshold is greater than or equal to the preset threshold. The embodiment realizes the monitoring of the texts with the similar texts of which the quantity exceeds the threshold value.

Description

Method and apparatus for transmitting information
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method and a device for sending information.
Background
With the rapid development of artificial intelligence technology, determining the similarity degree between texts becomes an essential important link in natural language processing technology. Text similarity calculation has been widely applied to the fields of information retrieval, data mining, paper identification, machine translation and the like. The related technology mainly judges whether texts are similar or not by text vectorization, calculating whether the similarity or distance between vectors reaches a threshold value or not.
Disclosure of Invention
The embodiment of the application provides a method and a device for sending information.
In a first aspect, an embodiment of the present application provides a method for sending information, where the method includes: acquiring a text set to be tested, wherein the text set to be tested comprises at least two texts to be tested; selecting a text to be tested from the text set to be tested, and executing the following grouping steps: determining whether other texts to be tested with the text association degree of the selected text to be tested exceeding a preset threshold exist in the text set to be tested; in response to determining that the text group exists, generating a text group based on the selected text to be tested and other text to be tested of which the text association degree with the selected text to be tested exceeds a preset threshold; determining the number of texts to be tested in the text group; determining whether the number of texts to be tested in the text group is greater than or equal to a preset threshold value; and responding to the determination that the current time is greater than or equal to the preset threshold value, and sending prompt information.
In some embodiments, determining whether there are other texts to be tested in the text set to be tested whose text association with the selected text to be tested exceeds a preset threshold includes: determining whether other hash values with Hamming Distance (Hamming Distance) smaller than a preset Distance threshold exist in the hash (hash) value set corresponding to the selected text to be detected, wherein the hash values in the hash value set are generated based on the corresponding text to be detected in the text set to be detected; and responding to the determination of existence, and determining that other texts to be tested with the text association degree of the selected texts to be tested exceeding a preset threshold exist in the text set to be tested.
In some embodiments, the above method further comprises: determining whether the unselected texts to be tested exist in the text set to be tested; and responding to the situation that the text to be tested which is not selected exists in the text set to be tested, selecting the text to be tested which is not selected from the text set to be tested, and continuing to execute the grouping step.
In some embodiments, the preset distance threshold is 1.
In some embodiments, obtaining a set of texts to be tested includes: acquiring a target text set, wherein the target text comprises a text in a webpage from a target website; for a target text in the target text set, in response to determining that the target text meets a preset deletion condition, deleting the target text from the target text set; and determining the target text set in which the target texts meeting the deletion conditions are deleted as the text set to be detected.
In a second aspect, an embodiment of the present application provides an apparatus for transmitting information, where the apparatus includes: the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire a text set to be tested, and the text set to be tested comprises at least two texts to be tested; a sending unit configured to select a text to be tested from a text set to be tested, and perform the following grouping steps: determining whether other texts to be tested with the text association degree of the selected text to be tested exceeding a preset threshold exist in the text set to be tested; in response to determining that the text group exists, generating a text group based on the selected text to be tested and other text to be tested of which the text association degree with the selected text to be tested exceeds a preset threshold; determining the number of texts to be tested in the text group; determining whether the number of texts to be tested in the text group is greater than or equal to a preset threshold value; and responding to the determination that the current time is greater than or equal to the preset threshold value, and sending prompt information.
In some embodiments, the sending unit is further configured to: selecting a text to be tested from the text set to be tested, and executing the following grouping steps: determining whether other hash values with Hamming distances between the hash values corresponding to the selected text to be tested are smaller than a preset distance threshold value exist in the hash value set, wherein the hash values in the hash value set are generated on the basis of the corresponding text to be tested in the text set to be tested; in response to determining that the text group exists, generating a text group based on the selected text to be tested and other texts to be tested, the text association degree of which with the selected text to be tested exceeds a preset threshold; determining the number of texts to be tested in the text group; determining whether the number of texts to be tested in the text group is greater than or equal to a preset threshold value; and responding to the determination that the current time is greater than or equal to the preset threshold value, and sending prompt information.
In some embodiments, the above apparatus further comprises: the adjusting unit is configured to determine whether the unselected texts to be tested exist in the text set to be tested; and responding to the situation that the text to be tested which is not selected exists in the text set to be tested, selecting the text to be tested which is not selected from the text set to be tested, and continuing to execute the grouping step.
In some embodiments, the preset distance threshold is 1.
In some embodiments, the obtaining unit comprises: the acquisition module is configured to acquire a target text set, wherein the target text comprises text in a webpage from a target website; a deletion module configured to, for a target text in the set of target texts, delete the target text from the set of target texts in response to determining that the target text satisfies a preset deletion condition; and the updating module is configured to determine the target text set in which the target texts meeting the deletion conditions are deleted as the text set to be detected.
In a third aspect, an embodiment of the present application provides a server, where the server includes: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the method described in any implementation manner of the first aspect.
According to the method and the device for sending the information, the text set to be tested including at least two texts to be tested is obtained firstly. And then, selecting the text to be tested from the text set to be tested. And then, determining whether other texts to be tested, the text association degree of which with the selected texts to be tested exceeds a preset threshold value, exist in the text set to be tested. In response to determining that the text group exists, generating a text group based on the selected text to be tested and other text to be tested whose text association with the selected text to be tested exceeds a preset threshold. And then, determining the number of texts to be tested in the text group. And then, determining whether the number of the texts to be tested in the text group is greater than or equal to a preset threshold value. And finally, responding to the fact that the number of the texts to be tested in the text group is larger than or equal to the preset threshold value and responding to the fact that the number of the texts to be tested in the text group is larger than or equal to the preset threshold value, and sending prompt information. Through the determination and the grouping counting of the similar articles, whether prompt information is sent or not is determined according to the counting result, so that the monitoring of the texts with the quantity of the similar texts exceeding the threshold value is realized.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for transmitting information according to the present application;
FIG. 3 is a schematic diagram of an application scenario of a method for transmitting information according to an embodiment of the present application;
FIG. 4 is a flow diagram of yet another embodiment of a method for transmitting information according to the present application;
FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for transmitting information in accordance with the present application;
FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows an exemplary architecture 100 of a method for transmitting information or an apparatus for transmitting information to which an embodiment of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a search-type application, an instant messaging tool, a mailbox client, social platform software, a text editing-type application, a browser-type application, a reading-type application, and the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting receiving communication information, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above.
The server 105 may be a server providing various services, such as a background server sending prompt information to the terminal devices 101, 102, 103 indicating that the amount of similar text exceeds a threshold. The background server can analyze the acquired texts and count the similar texts in groups. The background server may further determine whether the number of similar texts reaches a preset threshold. When the threshold value is determined to be exceeded, the background server can send prompt information to the terminal equipment.
The text may be directly stored locally in the server 105, and the server 105 may directly extract and process the locally stored text. The reminder may also be a message with a reminder, such as a message pop-up, displayed on a display local to the server 105. At this time, the terminal apparatuses 101, 102, 103 and the network 104 may also be absent.
It should be noted that the method for sending information provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for sending information is generally disposed in the server 105.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
With continued reference to fig. 2, a flow 200 of one embodiment of a method for transmitting information in accordance with the present application is shown. The method for transmitting information includes the steps of:
step 201, acquiring a text set to be tested.
In the present embodiment, an execution subject of the method for sending information (such as the server 105 shown in fig. 1) may acquire a text set to be tested in various ways. The text to be tested generally includes a title and a body content. Optionally, the text to be tested may further include, but is not limited to, at least one of: author information, information for recording release time, abstract, text content, reading data, transfer data, comment information, character information related to text.
In some optional implementations of this embodiment, the text to be tested may include semi-structured text data stored in the server in advance.
In some alternative implementations of the embodiment, the set of texts to be tested may include texts from websites containing UGC (User Generated Content) data. The website may include, but is not limited to, at least one of the following: social networks, knowledge sharing networks, community forums. The test text collection may also include text from other network sources (e.g., microblogs, WeChat public numbers) that contain UGC data.
In some optional implementation manners of this embodiment, the executing entity may obtain the text set to be tested from other servers or terminal devices in communication connection. The other server is for example a database server (not shown in fig. 1). The terminal device is, for example, a notebook computer 103 shown in fig. 1. In practice, the text set to be tested may also be stored locally in the execution body. At this time, the execution main body may directly obtain the text set to be tested from the local.
Step 202, selecting a text to be tested from the text set to be tested.
In this embodiment, the executing main body may select the text to be tested from the text set to be tested obtained in step 201, and execute the grouping steps from step 203 to step 207.
In this embodiment, the selection manner and the number of the texts to be tested are not limited. For example, at least one text to be tested may be randomly selected, or the text to be tested may be selected from the first to the second in the time sequence of storing the text set to be tested.
Step 203, determining whether other texts to be tested whose text association degree with the selected text to be tested exceeds a preset threshold exist in the text set to be tested.
In this embodiment, the executing body may determine, according to various methods, whether there are other texts to be tested in the text set to be tested whose text association with the selected text to be tested exceeds a preset threshold.
In the present embodiment, the text relevance is used to characterize the similarity between texts. It can be expressed as both a Distance between texts, such as Euclidean Distance (Euclidean Distance), and a similarity between texts, such as cosine similarity. Accordingly, when the text relevance is represented as a distance between texts, the text relevance exceeding the preset threshold may be represented as a distance between texts smaller than a preset distance threshold. When the text relevance is represented as the similarity between texts, the text relevance exceeding a preset threshold may be represented as the similarity between texts being greater than a preset similarity threshold.
In some optional implementation manners of this embodiment, the executing body may determine whether there are other to-be-detected texts in the to-be-detected text set, in which the text association degree with the selected to-be-detected text exceeds a preset threshold, by determining whether there are other hash values in the hash value set, where a hamming distance between the hash values corresponding to the selected to-be-detected text is smaller than the preset distance threshold. The hash value in the hash value set may be generated in various ways based on the corresponding text to be detected in the text set to be detected.
Optionally, for the text to be tested in the text set to be tested, the hash values correspond to the text to be tested one to one. Generally, the hash value is a character string having a short fixed length. Currently, algorithms commonly used for generating hash values include, but are not limited to, a simhash (simhash) algorithm, a minhash (minhash) algorithm, and the like. Taking the similar hash algorithm as an example, a title and a text are extracted from a text to be tested, then word segmentation, vectorization and other processing are performed, and then the similar hash algorithm is utilized to finally obtain a hash value consisting of a plurality of bit binary systems, such as '101001'.
Alternatively, for hash values in the set of hash values described above, a hamming distance between the hash values may be calculated. The Hamming distance is used for representing the number of different characters on corresponding positions of two character strings with equal length. For example, for the hash values obtained by the similar hash algorithm, the hamming distance is used to indicate the number of 1 obtained by bitwise xoring two hash values. As an example, the hamming distance between 21485 and 22482 is 2. As yet another example, the hamming distance between 101001 and 111001 is 1.
Optionally, the executing body may compare the hash value corresponding to the selected text to be tested with other hash values in the hash value set, and determine whether a hamming distance between the hash values is smaller than a preset distance threshold. Alternatively, the preset distance threshold may be a value preset by a user or a technician. For example, the preset distance threshold is usually set to 3, but may be set to 1. When the preset distance threshold is set to 1, it means that the text association degree between the text to be detected and the selected text to be detected can be determined to exceed the preset threshold only if the hash value corresponding to the text to be detected is the same as the hash value corresponding to the selected text to be detected.
In some optional implementation manners of this embodiment, the execution main body may select one text to be tested to compare with another text to be tested in the text set to be tested. For the two texts to be tested, firstly, the longest three sentences in the texts to be tested are respectively determined to form two sentence groups. And then segmenting the sentences respectively. The number of words included in the two sentence groups is determined, and the larger of the two numbers is taken as the total number (if the two numbers are the same, either one is taken as the total number). The words present in the two sentence sets are then determined separately. The number of words that appear in both sentence sets is then calculated. Dividing the total number by the number, and taking the ratio as the common word ratio. When the ratio of the common words is greater than a preset threshold, for example, 80%, the executing body may determine that there are other texts to be tested in the set of texts to be tested whose text association with the selected text to be tested exceeds the preset threshold.
In some optional implementation manners of this embodiment, the executing body may also select two texts to be tested to be compared with another text to be tested in the text set to be tested. The three texts to be tested are firstly vectorized, and the text vectorization method can include but is not limited to one-hot coding (one-bit effective coding) and TF-IDF (Term Frequency-Inverse text Frequency index) algorithm. And then, according to the obtained three vectors corresponding to the three texts to be tested, respectively calculating cosine similarity between the vector corresponding to the selected text to be tested and the vector corresponding to the other text to be tested. When at least one of the two cosine similarity values is greater than or equal to a preset threshold, for example, 0.6. The execution subject may determine that there are other texts to be tested in the text set to be tested whose text association with the selected text to be tested exceeds a preset threshold.
It should be noted that the above-mentioned technology for extracting specific content from a text, the technology for segmenting a text, the technology for determining the longest sentence in a text to be tested, the text vectorization method, and the calculation of cosine similarity are all well-known technologies widely studied and applied at present, and are not described herein again.
And 204, in response to the determination that the text group exists, generating a text group based on other texts to be tested and the selected text to be tested, wherein the text association degree of the selected text to be tested exceeds a preset threshold value.
In this embodiment, in response to determining that the text to be detected exists in step 203, the executing entity may determine, as the matching text, another text to be detected whose text association with the selected text to be detected exceeds a preset threshold. And generating a text group based on the matched text and the selected text to be detected.
Optionally, the text group may include other texts to be tested and the selected text to be tested, where the text association degree with the selected text to be tested exceeds a preset threshold. Optionally, the text group may further include a text recording information of the selected text to be tested and other texts to be tested whose text association with the selected text to be tested exceeds a preset threshold. For example, the title, author information, and text of the keyword of the text to be tested are recorded in a list form.
In some optional implementation manners of this embodiment, the text group may include all other texts to be tested in the text set to be tested, where the text association degree with the selected text to be tested exceeds a preset threshold, and the selected text to be tested.
In some optional implementation manners of this embodiment, if the generated text group and the already existing text group include the same text to be tested based on the other text to be tested and the selected text to be tested, of which the text association with the selected text to be tested exceeds the preset threshold, a new text group may not be repeatedly generated.
And step 205, determining the number of texts to be tested in the text group.
In this embodiment, the execution subject may determine the number of texts to be tested in the text group generated in step 204.
In some optional implementation manners of this embodiment, the number of the selected texts to be tested is different, and the situations of whether other texts to be tested in the text set to be tested, the text association degree of which with the selected texts to be tested exceeds the preset threshold, exist are different. Accordingly, the number of generated text groups may also vary. For the case of generating a plurality of text groups, the executing body may determine the number of texts to be tested in the text group for each text group generated in step 204.
Step 206, determining whether the number of the texts to be tested in the text group is greater than or equal to a preset threshold value.
In this embodiment, the execution subject may determine whether the number determined in step 205 is greater than or equal to a preset threshold. The preset threshold value may be a value preset by a user or a technician, for example, 20 or 100.
In some optional implementation manners of this embodiment, for the case that a plurality of text groups are generated in step 204, the number of texts to be tested in the text group may be respectively determined for each text group in step 205. Correspondingly, the execution main body may also determine whether the number of texts to be tested is greater than or equal to a preset threshold value for the number of texts to be tested in each text group.
Step 207, in response to determining that the number of the texts to be tested in the text group is greater than or equal to the preset threshold value, sending a prompt message.
In this embodiment, the execution body may send the prompt message in response to the number determined in step 206 being greater than or equal to the preset threshold.
In some optional implementation manners of this embodiment, for the case of multiple text groups in step 206, the executing main body may send a prompt message in response to determining that the number of texts to be tested in one text group is greater than or equal to a preset threshold. The executing entity may also determine whether the number of texts to be tested in each text group determined in step 206 is greater than or equal to a preset threshold, and send information indicating that the number of texts to be tested in at least two text groups is greater than or equal to the preset threshold in response to that the number of texts to be tested in at least two text groups is greater than or equal to the preset threshold.
In some optional implementations of this embodiment, the prompt message may be in various forms. As an example, the prompt information may be a predetermined character to represent that the number of texts to be tested in the text group is greater than or equal to a preset threshold, for example, 1. As another example, the prompt information may include information of the text to be tested in the text group, such as a title, author information, a keyword, and the number of similar articles of the text to be tested. The execution main body can also send prompt information containing the information of the text to be tested in the text group to the client side in communication connection.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for transmitting information according to the present embodiment. In the application scenario of fig. 3, a backend server 301 is communicatively connected to a notebook client 302. The background server 301 first obtains a text set 303 to be tested, which includes 100 texts to be tested. Then, the background server selects a text to be tested with the title of "2018 beijing college entrance examination composition" from the text set 303 to be tested. Then, the background server generates a text group 304 by using the selected text to be tested with the title of "2018 beijing college entrance composition" and the text group to be tested, which is determined in the text set to be tested and has the text similarity with the selected text to be tested exceeding a threshold value. Then, the background server determines 305 the number of texts to be tested in the text group, for example, 50. Next, the backend server compares 50 with a preset threshold. For example, the preset threshold is 30. And the background server determines that the number of the texts to be tested in the text group exceeds a preset threshold value. Finally, the backend server sends a prompt 306 to the notebook client. The number of similar articles with the prompt information, for example, "2018 beijing college entrance examination article" has reached 50.
The method provided by the above embodiment of the application includes firstly acquiring a text set to be tested; then selecting a text to be tested from the text set to be tested; then, in response to the fact that other texts to be tested with the text relevance degree exceeding a preset threshold value exist in the text set to be tested, generating a text group based on the other texts to be tested with the text relevance degree exceeding the preset threshold value and the selected text to be tested; then determining the number of texts to be tested in the text group; and sending prompt information in response to the fact that the number of the texts to be tested in the text group is larger than or equal to a preset threshold value. The texts with the text association degrees exceeding the threshold are determined and counted in groups, whether prompt information is sent or not is determined according to the counting result, the texts with the similar text quantity exceeding the threshold are monitored, and therefore the method can be used in a public opinion monitoring system, and monitoring modes of popular articles are enriched.
With further reference to fig. 4, a flow 400 of yet another embodiment of a method for transmitting information is shown. The process 400 of the method for transmitting information includes the steps of:
step 401, a target text set is obtained.
In the present embodiment, the execution subject of the method for transmitting information (e.g., the server 105 shown in fig. 1) may acquire the target text set in various ways. The target text may include text in a web page from the target website.
In some optional implementation manners of this embodiment, the target text set may be pre-stored in a database, or may be obtained by preprocessing an acquired webpage. The web page capturing system may typically be a web crawler. The preprocessing may include extracting text content from the web page. The extracted text content may include a title and a body. Optionally, the extracted text content may further include at least one of: author information of the text, source of the text content, comment, release time, URL (Uniform Resource Locator) of the web page where the text content is located. Optionally, the preprocessing may also include deleting the collected unrelated web pages (e.g., web pages whose text content relates to advertisements, goods, etc.), so as to obtain a target text set containing the semi-structured text data.
Optionally, the target website may be any website specified in advance according to actual application requirements, or may be a website determined by the URL added to the crawling queue by the web page acquisition system (e.g., a web crawler) according to the set search policy in the process of crawling the web page. As an example, the target website may be a website (e.g., a news website, a media column) including OGC (Professional-produced Content), or a website (e.g., a government website, a government micro blog) including PGC (Professional-produced Content).
Step 402, for a target text in a target text set, in response to determining that the target text meets a preset deletion condition, deleting the target text from the target text set.
In this embodiment, for each target text in the target text set acquired in step 401, the executing main body may delete the target text from the target text set in response to determining that the target text meets a preset deletion condition.
Alternatively, the preset deleting condition may be a condition preset according to an actual application requirement. As an example, when monitoring of the number of public opinion articles is required, the preset deleting condition may be to delete a text of science popularization type unrelated public opinion. Technically, whether the preset deleting condition is met or not can be judged through aspects such as text characteristic words, content sources and author information. As yet another example, when monitoring of the number of articles in a period of time is required, the preset deletion condition may be to delete a text whose publication time is not within a preset time range. Technically, whether the preset deleting condition is met or not can be judged according to the publishing time of the text.
And step 403, determining the target text set from which the target texts meeting the deletion conditions are deleted as the text set to be detected.
In this embodiment, the executing entity may determine the target text set from which the target text meeting the deletion condition is deleted in step 402 as the text set to be detected.
Optionally, deleting the target texts meeting the preset deletion condition in the target text set, forming a new set of the target texts reserved in the target text set, and taking the new set as a text set to be detected.
Step 404, selecting a text to be tested from the text set to be tested.
Step 405, determining whether other texts to be tested whose text association degree with the selected text to be tested exceeds a preset threshold exist in the text set to be tested.
Step 406, in response to determining that the text group exists, generating a text group based on the other text to be tested and the selected text to be tested, wherein the text association degree of the other text to be tested and the selected text to be tested exceeds a preset threshold.
Step 407, determining the number of texts to be tested in the text group.
And step 408, determining whether the number of the texts to be tested in the text group is greater than or equal to a preset threshold value.
In response to determining that the threshold is greater than or equal to the preset threshold, a prompt message is sent, step 409.
It should be noted that the detailed processing from step 404 to step 409 and the technical effects thereof may refer to step 202 to step 207 in the embodiment corresponding to fig. 2, and are not described herein again.
Step 410, determining whether the text to be tested which is not selected exists in the text set to be tested.
In this embodiment, the execution main body may determine whether there is an unselected text to be tested in the text set to be tested.
In this embodiment, the executing body may further return to step 404 in response to determining that the text to be tested that is not selected exists in the text set to be tested. The unselected texts to be tested are selected from the text set to be tested and used to continue the grouping steps from 405 to 409.
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for sending information in this embodiment highlights the step of acquiring the text set to be tested, and the step of determining whether there is an unselected text to be tested in the text set to be tested. Therefore, the scheme described in the embodiment can monitor a large amount of texts in the webpage, so that the texts with the number of similar texts exceeding the threshold value in the webpage are monitored.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for sending information, which corresponds to the method embodiment shown in fig. 2, and which can be applied in various electronic devices in particular.
As shown in fig. 5, the apparatus 500 for transmitting information provided by the present embodiment includes an acquisition unit 501 and a transmission unit 502. The acquiring unit 501 is configured to acquire a text set to be tested, where the text set to be tested includes at least two texts to be tested; a sending unit 502 configured to select a text to be tested from the text set to be tested, and perform the following grouping steps: determining whether other texts to be tested with the text association degree of the selected text to be tested exceeding a preset threshold exist in the text set to be tested; in response to determining that the text group exists, generating a text group based on the selected text to be tested and other text to be tested of which the text association degree with the selected text to be tested exceeds a preset threshold; determining the number of texts to be tested in the text group; determining whether the number of texts to be tested in the text group is greater than or equal to a preset threshold value; and responding to the determination that the current time is greater than or equal to the preset threshold value, and sending prompt information.
In this embodiment, in the apparatus 500 for sending information, specific processing of the obtaining unit 501 and the sending unit 502 and technical effects thereof may refer to the related descriptions of step 201 and step 202 to step 207 in the corresponding embodiment of fig. 2, which are not described herein again.
In some optional implementations of this embodiment, the sending unit 502 may be further configured to: selecting a text to be tested from the text set to be tested, and executing the following grouping steps: determining whether other hash values with Hamming distances between the hash values corresponding to the selected text to be tested and smaller than a preset distance threshold exist in the hash value set, wherein the hash values in the hash value set are generated on the basis of the corresponding text to be tested in the text set to be tested; in response to determining that the text group exists, generating a text group based on the selected text to be tested and other text to be tested of which the text association degree with the selected text to be tested exceeds a preset threshold; determining the number of texts to be tested in the text group; determining whether the number of texts to be tested in the text group is greater than or equal to a preset threshold value; and responding to the determination that the current time is greater than or equal to the preset threshold value, and sending prompt information.
In some optional implementations of this embodiment, the apparatus 500 for sending information may further include: an adjusting unit (not shown in fig. 5) configured to determine whether there is an unselected text to be tested in the text set to be tested; and responding to the situation that the text to be tested which is not selected exists in the text set to be tested, selecting the text to be tested which is not selected from the text set to be tested, and continuing to execute the grouping step.
In some optional implementations of this embodiment, the preset distance may be 1.
In some optional implementations of this embodiment, the obtaining unit 501 may include an obtaining module (not shown in fig. 5), a deleting module (not shown in fig. 5), and an updating module (not shown in fig. 5). The retrieval module may be configured to obtain a target text set, where the target text includes text in a web page from a target website. The deletion module may be configured to, for a target text in the target text set, delete the target text from the target text set in response to determining that the target text satisfies a preset deletion condition. The updating module may be configured to determine a target text set from which the target text satisfying the deletion condition is deleted as a text set to be tested.
In the apparatus provided in the foregoing embodiment of the present application, the to-be-tested text set is obtained by the obtaining unit 501. Then, the sending unit 502 selects a text to be tested from the text set to be tested. Then, the sending unit 502 determines whether there are other texts to be tested in the text set to be tested whose text association with the selected text to be tested exceeds a preset threshold. Then, in response to determining that the text group exists, the sending unit 502 forms a text group with the other text to be tested whose text association with the selected text to be tested exceeds a preset threshold value and the selected text to be tested. Next, the sending unit 502 determines the number of texts to be tested in the text group. Finally, in response to determining that the number of the texts to be tested in the text group is greater than or equal to the preset threshold, the sending unit 502 sends the prompt message. The texts with the text association degrees exceeding the threshold are determined and grouped, and whether prompt information is sent or not is determined according to the counting result, so that the texts with the similar text number exceeding the threshold are monitored.
Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing a server according to embodiments of the present application. The server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. A driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that server 100 also includes some other well-known structures, such as processors, memory, etc., which are not shown in fig. 1 in order to not unnecessarily obscure embodiments of the present disclosure.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: an apparatus for transmitting information includes an acquisition unit and a transmission unit. The names of the units do not form a limitation to the unit itself in some cases, and for example, the acquiring unit may also be described as a "unit acquiring a text set to be tested".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the server described in the above embodiments; or may exist separately and not be assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: acquiring a text set to be tested, wherein the text set to be tested comprises at least two texts to be tested; selecting a text to be tested from the text set to be tested, and executing the following grouping steps: determining whether other texts to be tested with the text association degree of the selected text to be tested exceeding a preset threshold exist in the text set to be tested; in response to determining that the text group exists, generating a text group based on the selected text to be tested and other text to be tested of which the text association degree with the selected text to be tested exceeds a preset threshold; determining the number of texts to be tested in the text group; determining whether the number of texts to be tested in the text group is greater than or equal to a preset threshold value; and responding to the determination that the current time is greater than or equal to the preset threshold value, and sending prompt information.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. A method for transmitting information, comprising:
acquiring a text set to be tested, wherein the text set to be tested comprises at least two texts to be tested;
selecting a text to be tested from the text set to be tested, and executing the following grouping steps: determining whether other texts to be tested with the text association degree of the selected text to be tested exceeding a preset threshold exist in the text set to be tested, including: selecting two texts to be tested to be compared with the other text to be tested in the text set to be tested; vectorizing the two texts to be tested and the other text to be tested; respectively calculating cosine similarity between the vector corresponding to the two texts to be tested and the vector corresponding to the other text to be tested according to the obtained three vectors corresponding to the two texts to be tested and the other text to be tested; responding to at least one value of the cosine similarity which is larger than or equal to a preset threshold, and determining that other texts to be tested with the text association degrees of the two texts to be tested exceeding the preset threshold exist in the text set to be tested; in response to determining that the text group exists, generating a text group based on the selected text to be tested and other texts to be tested, the text association degree of which with the selected text to be tested exceeds a preset threshold; determining the number of texts to be tested in the text group; determining whether the number of texts to be tested in the text group is greater than or equal to a preset threshold value; and responding to the determination that the current time is greater than or equal to the preset threshold value, and sending prompt information.
2. The method according to claim 1, wherein the determining whether there are other texts to be tested in the set of texts to be tested whose text association with the selected text to be tested exceeds a preset threshold includes:
determining whether other hash values with Hamming distances between the hash values corresponding to the selected text to be tested and being smaller than a preset distance threshold exist in a hash value set, wherein the hash values in the hash value set are generated on the basis of the corresponding text to be tested in the text set to be tested;
and responding to the determination of existence, and determining that other texts to be tested exist in the text set to be tested, wherein the text association degree of the selected texts to be tested exceeds a preset threshold value.
3. The method of claim 1, wherein the method further comprises:
determining whether the text to be detected which is not selected exists in the text set to be detected;
and responding to the situation that the text to be tested which is not selected exists in the text set to be tested, selecting the text to be tested which is not selected from the text set to be tested, and continuing to execute the grouping step.
4. The method of claim 2, wherein the preset distance threshold is 1.
5. The method according to one of claims 1 to 4, wherein the obtaining of the text set to be tested comprises:
acquiring a target text set, wherein the target text comprises a text in a webpage from a target website;
for a target text in the target text set, in response to determining that the target text meets a preset deletion condition, deleting the target text from the target text set;
and determining the target text set for deleting the target texts meeting the deletion condition as the text set to be detected.
6. An apparatus for transmitting information, comprising:
the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire a text set to be tested, and the text set to be tested comprises at least two texts to be tested;
a sending unit configured to select a text to be tested from the text set to be tested, and perform the following grouping steps: determining whether other texts to be tested with the text association degree of the selected text to be tested exceeding a preset threshold exist in the text set to be tested, and further configuring the texts to be tested to: selecting two texts to be tested to be compared with the other text to be tested in the text set to be tested; vectorizing the two texts to be tested and the other text to be tested; respectively calculating cosine similarity between the vector corresponding to the two texts to be tested and the vector corresponding to the other text to be tested according to the obtained three vectors corresponding to the two texts to be tested and the other text to be tested; responding to at least one value of the cosine similarity which is larger than or equal to a preset threshold, and determining that other texts to be tested with the text association degrees of the two texts to be tested exceeding the preset threshold exist in the text set to be tested; in response to determining that the text group exists, generating a text group based on the selected text to be tested and other text to be tested of which the text association degree with the selected text to be tested exceeds a preset threshold; determining the number of texts to be tested in the text group; determining whether the number of texts to be tested in the text group is greater than or equal to a preset threshold value; and sending prompt information in response to determining that the threshold is greater than or equal to the preset threshold.
7. The apparatus of claim 6, wherein the transmitting unit is further configured to:
selecting a text to be tested from the text set to be tested, and executing the following grouping steps: determining whether other hash values with Hamming distances between the hash values corresponding to the selected text to be tested are smaller than a preset distance threshold value exist in a hash value set, wherein the hash values in the hash value set are generated on the basis of the corresponding text to be tested in the text set to be tested; in response to determining that the text group exists, generating a text group based on the selected text to be tested and other text to be tested of which the text association degree with the selected text to be tested exceeds a preset threshold; determining the number of texts to be tested in the text group; determining whether the number of texts to be tested in the text group is greater than or equal to a preset threshold value; and responding to the determination that the current time is greater than or equal to the preset threshold value, and sending prompt information.
8. The apparatus of claim 6, wherein the apparatus further comprises:
an adjusting unit configured to determine whether there is an unselected text to be tested in the text set to be tested; and responding to the situation that the text to be tested which is not selected exists in the text set to be tested, selecting the text to be tested which is not selected from the text set to be tested, and continuing to execute the grouping step.
9. The apparatus of claim 7, wherein the preset distance threshold is 1.
10. The apparatus according to one of claims 6-9, wherein the obtaining unit comprises:
a retrieval module configured to retrieve a set of target texts, wherein the target texts comprise texts in web pages from a target website;
a deletion module configured to, for a target text in the target text set, in response to determining that the target text satisfies a preset deletion condition, delete the target text from the target text set;
and the updating module is configured to determine a target text set for deleting the target texts meeting the deletion condition as a text set to be detected.
11. A server, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201811030779.6A 2018-09-05 2018-09-05 Method and apparatus for transmitting information Active CN110891010B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811030779.6A CN110891010B (en) 2018-09-05 2018-09-05 Method and apparatus for transmitting information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811030779.6A CN110891010B (en) 2018-09-05 2018-09-05 Method and apparatus for transmitting information

Publications (2)

Publication Number Publication Date
CN110891010A CN110891010A (en) 2020-03-17
CN110891010B true CN110891010B (en) 2022-09-16

Family

ID=69744261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811030779.6A Active CN110891010B (en) 2018-09-05 2018-09-05 Method and apparatus for transmitting information

Country Status (1)

Country Link
CN (1) CN110891010B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114219571A (en) * 2021-12-16 2022-03-22 广州华多网络科技有限公司 E-commerce independent site matching method and device, equipment, medium and product thereof

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0531923A2 (en) * 1991-09-10 1993-03-17 Eastman Kodak Company Method and apparatus for gray-level quantization
CN103324664A (en) * 2013-04-27 2013-09-25 国家电网公司 Document similarity distinguishing method based on Fourier transform
CN103763429A (en) * 2013-12-16 2014-04-30 深圳市金立通信设备有限公司 Method and terminal for displaying text messages
CN103838789A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Text similarity computing method
CN106446148A (en) * 2016-09-21 2017-02-22 中国运载火箭技术研究院 Cluster-based text duplicate checking method
CN106708415A (en) * 2015-11-13 2017-05-24 阿里巴巴集团控股有限公司 Data processing method and device
CN107644010A (en) * 2016-07-20 2018-01-30 阿里巴巴集团控股有限公司 A kind of Text similarity computing method and device
CN108170650A (en) * 2016-12-07 2018-06-15 北京京东尚科信息技术有限公司 Text comparative approach and text comparison means
CN108197102A (en) * 2017-12-26 2018-06-22 百度在线网络技术(北京)有限公司 A kind of text data statistical method, device and server

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7752204B2 (en) * 2005-11-18 2010-07-06 The Boeing Company Query-based text summarization
CN100530192C (en) * 2007-10-09 2009-08-19 华为技术有限公司 Text searching method and device
CN101763404B (en) * 2009-12-10 2012-03-21 陕西鼎泰科技发展有限责任公司 Network text data detection method based on fuzzy cluster
CN102567537A (en) * 2011-12-31 2012-07-11 武汉理工大学 Short text similarity computing method based on searched result quantity
CN103309851B (en) * 2013-05-10 2016-01-27 微梦创科网络科技(中国)有限公司 The rubbish recognition methods of short text and system
CN103425639A (en) * 2013-09-06 2013-12-04 广州一呼百应网络技术有限公司 Similar information identifying method based on information fingerprints
CN103886077B (en) * 2014-03-24 2017-04-19 广东省电信规划设计院有限公司 Short text clustering method and system
CN104699785A (en) * 2015-03-10 2015-06-10 中国石油大学(华东) Paper similarity detection method
CN105302779A (en) * 2015-10-23 2016-02-03 北京慧点科技有限公司 Text similarity comparison method and device
CN106815226A (en) * 2015-11-27 2017-06-09 阿里巴巴集团控股有限公司 Text matching technique and device
CN105373521B (en) * 2015-12-04 2018-06-29 湖南工业大学 It is a kind of that the method for calculating text similarity is filtered based on Minwise Hash dynamics multi-threshold
CN106326197A (en) * 2016-08-23 2017-01-11 达而观信息科技(上海)有限公司 Method for fast detecting repeated copying texts

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0531923A2 (en) * 1991-09-10 1993-03-17 Eastman Kodak Company Method and apparatus for gray-level quantization
CN103838789A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Text similarity computing method
CN103324664A (en) * 2013-04-27 2013-09-25 国家电网公司 Document similarity distinguishing method based on Fourier transform
CN103763429A (en) * 2013-12-16 2014-04-30 深圳市金立通信设备有限公司 Method and terminal for displaying text messages
CN106708415A (en) * 2015-11-13 2017-05-24 阿里巴巴集团控股有限公司 Data processing method and device
CN107644010A (en) * 2016-07-20 2018-01-30 阿里巴巴集团控股有限公司 A kind of Text similarity computing method and device
CN106446148A (en) * 2016-09-21 2017-02-22 中国运载火箭技术研究院 Cluster-based text duplicate checking method
CN108170650A (en) * 2016-12-07 2018-06-15 北京京东尚科信息技术有限公司 Text comparative approach and text comparison means
CN108197102A (en) * 2017-12-26 2018-06-22 百度在线网络技术(北京)有限公司 A kind of text data statistical method, device and server

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于语义指纹的海量文本快速相似检测算法研究;姜雪等;《电脑知识与技术》;20161225(第36期);全文 *
面向科研文本的资料管理与查重子系统的设计与实现;李宝莹;《中国优秀硕士学位论文全文数据库》;20180215;全文 *

Also Published As

Publication number Publication date
CN110891010A (en) 2020-03-17

Similar Documents

Publication Publication Date Title
US11023505B2 (en) Method and apparatus for pushing information
US11062089B2 (en) Method and apparatus for generating information
CN107679211B (en) Method and device for pushing information
US10795939B2 (en) Query method and apparatus
US11151177B2 (en) Search method and apparatus based on artificial intelligence
CN109145280B (en) Information pushing method and device
CN107463704B (en) Search method and device based on artificial intelligence
US11620321B2 (en) Artificial intelligence based method and apparatus for processing information
Sun et al. Near real-time twitter spam detection with machine learning techniques
CN109189857B (en) Data sharing system, method and device based on block chain
CN108256070B (en) Method and apparatus for generating information
CN106919711B (en) Method and device for labeling information based on artificial intelligence
CN107526718B (en) Method and device for generating text
CN107193974B (en) Regional information determination method and device based on artificial intelligence
CN107944032B (en) Method and apparatus for generating information
US11423096B2 (en) Method and apparatus for outputting information
CN111314388B (en) Method and apparatus for detecting SQL injection
CN117131281B (en) Public opinion event processing method, apparatus, electronic device and computer readable medium
CN110245357B (en) Main entity identification method and device
CN110895587B (en) Method and device for determining target user
CN110891010B (en) Method and apparatus for transmitting information
CN110737691B (en) Method and apparatus for processing access behavior data
CN110929512A (en) Data enhancement method and device
CN110881056A (en) Method and device for pushing information
CN114417102A (en) Text duplicate removal method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant