CN113032658A - Illegal word detection method, device and equipment and computer-readable storage medium - Google Patents

Illegal word detection method, device and equipment and computer-readable storage medium Download PDF

Info

Publication number
CN113032658A
CN113032658A CN202110213648.7A CN202110213648A CN113032658A CN 113032658 A CN113032658 A CN 113032658A CN 202110213648 A CN202110213648 A CN 202110213648A CN 113032658 A CN113032658 A CN 113032658A
Authority
CN
China
Prior art keywords
product
target internet
information
preset
internet product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110213648.7A
Other languages
Chinese (zh)
Inventor
张义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weikun Shanghai Technology Service Co Ltd
Original Assignee
Weikun Shanghai Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weikun Shanghai Technology Service Co Ltd filed Critical Weikun Shanghai Technology Service Co Ltd
Priority to CN202110213648.7A priority Critical patent/CN113032658A/en
Publication of CN113032658A publication Critical patent/CN113032658A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device and equipment for detecting illegal words and a computer readable storage medium. The method comprises the steps of acquiring product information of a target internet product from a preset storage engine based on identification information of the target internet product; detecting whether preset violation words exist in product information of a target internet product or not based on a preset search engine; if the preset violation words exist in the product information of the target internet product, acquiring target contact information corresponding to the target internet product; based on the target contact information, reporting the product information of the target internet product, thereby quickly and efficiently detecting the product information of the internet product; the product information of a plurality of internet products in the preset storage engine can be detected, so that the development cost and the later-stage operation and maintenance cost are reduced; meanwhile, product information of the target internet product is uniformly acquired from the preset storage engine, and development cost and time cost are reduced.

Description

Illegal word detection method, device and equipment and computer-readable storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for detecting illegal words.
Background
With the rapid development of computer technology, various internet products are increasing. The internet product is a non-physical product, and product information (such as product introduction, advertisement and the like) of the internet product is displayed to a great number of internet users, so that certain risks of violating laws or public order customs exist. For example, the slogan contraband that all commodities must not use includes: national level, world level, highest level, best, first, unique, first, best, accurate, top level, lowest, bottommost, cheapest, greatest degree, national level product, filling up domestic blank, absolute, exclusive, first, newest, most advanced, first brand, gold medal, name plate, most earning, super earning, first, huge star, luxury, extreme, top level enjoyment and other absolute terms, and for financial products, commitment words such as 'cost guarantee and interest', 'no risk' and the like can not appear.
In order to reduce the risk of violating laws or public customs, the product information of the internet products needs to be audited to detect whether illegal words exist. However, the existing illegal word detection method has low detection efficiency and high cost.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for detecting illegal words, and aims to solve the problems of low detection efficiency and high cost of the conventional illegal word detection mode.
In order to achieve the above object, the present invention provides a method for detecting an illegal word, including:
based on identification information of a target internet product, acquiring product information of the target internet product from a preset storage engine, wherein the preset storage engine comprises product information of a plurality of internet products;
detecting whether preset violation words exist in the product information of the target internet product or not based on a preset search engine;
if the preset violation word exists in the product information of the target internet product, acquiring target contact information corresponding to the target internet product;
and reporting the product information of the target internet product based on the target contact information.
Optionally, the product information includes text product information;
the step of obtaining the product information of the target internet product from a preset storage engine based on the identification information of the target internet product comprises the following steps:
and acquiring the text product information of the target internet product from a database of a preset storage engine based on the identification information of the target internet product.
Optionally, after the step of detecting whether a preset violating word exists in the product information of the target internet product based on the preset search engine, the violating word detection method further includes the following steps:
and if the preset violation word exists in the text product information of the target internet product, storing the text product information of the target internet product.
Optionally, the product information includes non-text product information;
the step of obtaining the product information of the target internet product from a preset storage engine based on the identification information of the target internet product comprises the following steps:
and acquiring non-text product information of the target internet product from an LFS log structure file system of a preset storage engine based on the identification information of the target internet product.
Optionally, the step of detecting whether a preset violation word exists in the product information of the target internet product based on a preset search engine includes:
performing character recognition on the non-text product information of the target internet product to obtain the text content of the target internet product;
and detecting whether preset violation words exist in the text content of the target Internet product based on a preset search engine.
Optionally, the step of performing character recognition on the non-text product information of the target internet product to obtain the text content of the target internet product includes:
performing character recognition on the non-text product information of the target internet product through an Optical Character Recognition (OCR) technology to obtain text content of the target internet product;
the step of detecting whether a preset violation word exists in the text content of the target internet product based on a preset search engine comprises the following steps:
storing the text content of the target Internet product into a database of a search server ES;
searching the preset violation words in a storage area of the text content of the target Internet product based on the search server ES;
if the preset violation word is searched in the storage area, judging that the preset violation word exists in the text content of the target internet product;
and if the preset violation word is not searched in the storage area, judging that the preset violation word does not exist in the text content of the target internet product.
Optionally, before the step of obtaining the product information of the target internet product from the preset storage engine based on the identification information of the target internet product, the illegal word detection method further includes the following steps:
receiving a product setting instruction, wherein the product setting instruction comprises Internet product identification information and corresponding contact information;
setting the internet product corresponding to the internet product identification information as a target internet product so as to detect the product information of the internet product;
determining configuration information of the target internet product based on the product setting instruction, wherein the configuration information comprises identification information of the target internet product and corresponding contact information;
if the preset violation word exists in the product information of the target internet product, the step of obtaining target contact information corresponding to the target internet product includes:
and if the preset violation word exists in the product information of the target internet product, acquiring contact information of the target internet product from the configuration information of the target internet product to serve as target contact information corresponding to the target internet product.
In addition, in order to achieve the above object, the present invention further provides an illegal word detection device, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring product information of a target internet product from a preset storage engine based on identification information of the target internet product, and the preset storage engine comprises the product information of a plurality of internet products;
the detection module is used for detecting whether preset violation words exist in the product information of the target internet product or not based on a preset search engine;
the second obtaining module is used for obtaining target contact person information corresponding to the target internet product if the preset violation word exists in the product information of the target internet product;
and the reporting module is used for reporting the product information of the target internet product based on the target contact information.
In addition, in order to achieve the above object, the present invention further provides an illegal word detection device, including: the illegal word detection method comprises a memory, a processor and an illegal word detection program which is stored on the memory and runs on the processor, wherein when the illegal word detection program is executed by the processor, the steps of the illegal word detection method are realized.
In addition, in order to achieve the above object, the present invention further provides a computer-readable storage medium, where an illegal word detection program is stored, and when being executed by a processor, the illegal word detection program implements the steps of the illegal word detection program method according to any one of the above items.
According to the technical scheme provided by the invention, the product information of the target internet product is acquired from a preset storage engine based on the identification information of the target internet product, wherein the preset storage engine comprises the product information of a plurality of internet products; detecting whether preset violation words exist in product information of a target internet product or not based on a preset search engine; if the preset violation words exist in the product information of the target internet product, acquiring target contact information corresponding to the target internet product; based on the target contact information, reporting the product information of the target internet product, thereby quickly and efficiently detecting the product information of the internet product; according to the invention, the product information of a plurality of internet products in the preset storage engine can be detected, a set of detection program does not need to be developed for each internet product, and the development cost and the later-stage operation and maintenance cost are reduced; meanwhile, product information of the internet products is uniformly acquired from a preset storage engine, direct butt joint with each internet product is not needed, and development cost and time cost are reduced; and if a new internet product exists, the product information of the internet product can be detected only by setting the new internet product as a target internet product without additionally developing a set of detection program, the flow is simple, and the cost is saved.
Drawings
Fig. 1 is a schematic structural diagram of an illegal word detection device in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for detecting violating words according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for detecting violating words according to a second embodiment of the present invention;
FIG. 4 is a flowchart illustrating a method for detecting violating words according to a third embodiment of the present invention;
fig. 5 is a block diagram illustrating a structure of a violating word detecting device according to a first embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an illegal word detection device in a hardware operating environment according to an embodiment of the present invention.
The offending word detection device can be a Mobile phone, a smart phone, a laptop, a digital broadcast receiver, a User Equipment (UE) such as a Personal Digital Assistant (PDA), a tablet computer (PAD), a handheld device, a vehicle mounted device, a wearable device, a computing device, a monitoring device, a server or other processing device connected to a wireless modem, a Mobile Station (MS), or the like.
In general, an illegal word detection apparatus includes: at least one processor 101, a memory 102, and an illegal word detection program stored on the memory and executable on the processor, the illegal word detection program being configured to implement the steps of the illegal word detection method according to any of the following embodiments.
Processor 101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 101 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 101 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 101 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. The processor 101 may further include an AI (Artificial Intelligence) processor for processing operations related to the illegal word detection method, so that the illegal word detection method model can be trained and learned autonomously, thereby improving efficiency and accuracy.
Memory 102 may include one or more computer-readable storage media, which may be non-transitory. Memory 102 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory 102 is used to store at least one instruction for execution by processor 101 to implement the offending word detection method provided by the method embodiments herein.
In some embodiments, the illegal word detection device may further include: a communication interface 103 and at least one peripheral device. The processor 101, memory 102 and communication interface 103 may be connected by a bus or signal lines. Various peripheral devices may be connected to communication interface 103 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 104, display screen 105, and power supply 106.
The communication interface 103 can be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 101 and the memory 102. In some embodiments, the processor 101, memory 102, and communication interface 103 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 101, the memory 102 and the communication interface 103 may be implemented on a single chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 104 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 104 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 104 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 104 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 104 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WIFI (Wireless Fidelity) networks. In some embodiments, the rf circuit 104 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 105 is a touch display screen, the display screen 105 also has the ability to capture touch signals on or over the surface of the display screen 105. The touch signal may be input to the processor 101 as a control signal for processing. At this point, the display screen 105 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 105 may be one, the front panel of the electronic device; in other embodiments, the display screens 105 may be at least two, respectively disposed on different surfaces of the electronic device or in a folded design; in still other embodiments, the display 105 may be a flexible display, disposed on a curved surface or on a folded surface of the electronic device. Even further, the display screen 105 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display screen 105 may be made of LCD (liquid crystal Display), OLED (Organic Light-Emitting Diode), and the like.
The power supply 106 is used to supply power to various components in the electronic device. The power source 106 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 106 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology. Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the offending word detection apparatus, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where an illegal word detection program is stored on the computer-readable storage medium, and when being executed by a processor, the illegal word detection program implements the steps of the illegal word detection method according to any one of the following embodiments. Therefore, a detailed description thereof will be omitted. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application. It is determined that, by way of example, the program instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
It will be understood by those skilled in the art that all or part of the processes of the methods of any of the following embodiments may be implemented by a computer program that instructs associated hardware, and the violating term detection program may be stored in a computer readable storage medium, and when executed, may include processes of the embodiments of the methods described below. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
In the related art, in order to detect whether illegal words exist in product information of internet products, all product line systems usually detect by themselves, data of all the systems are isolated from each other, and used technologies are different, so that a set of detection program needs to be developed for each product line system, and therefore development cost is high, later-stage operation and maintenance cost is high, operation and maintenance difficulty is large, and detection efficiency is low.
In order to solve the above technical problems, embodiments of the present invention are proposed based on the above hardware configuration.
The illegal word detection method comprises the following steps:
referring to fig. 2, fig. 2 is a flowchart illustrating a method for detecting an illegal word according to a first embodiment of the present invention. In the embodiment of the invention, the illegal word detection method comprises the following steps:
step S21: and acquiring the product information of the target internet product from a preset storage engine based on the identification information of the target internet product.
It should be understood that internet products are commodities produced for business in the internet field, which are intangible carriers to meet the needs and desires of internet users. That is, the internet product refers to functions and services created by a website to meet the needs of a user, and is an integration of the functions and services of the website. The internet products include, but are not limited to, various software products, such as financial products (e.g., "safe bank", "safe securities", etc.), and communication products (e.g., "WeChat", "QQ", etc.).
The identification information of the internet product is used to distinguish the internet product from other internet products, which may be the name of the internet product, etc.
A storage engine is a medium for storing data. The default storage engine includes, but is not limited to, a database, an LFS (Log-structured file system), and the like. Wherein the database includes, but is not limited to, a hive (data warehouse tool) database.
Product information for internet products includes, but is not limited to: name of the internet product, product introduction, etc. The form of the product information includes, but is not limited to, text type product information, non-text type product information, and the like. The text product information includes but is not limited to: word introduction of internet products, etc.; the non-text product information includes, but is not limited to, product information in a picture Format, product information in a Portable Document Format (PDF), and the like, for example, a product advertisement picture, a single page picture, a contract file in a picture or PDF Format, and the like.
In the embodiment of the invention, one or more target internet products are preset; the preset storage engine stores product information of a plurality of internet products. And aiming at each target internet product, the internet product detection equipment acquires the product information of the target internet product from a preset storage engine based on the identification information of the target internet product. That is to say, in the embodiment of the present invention, the product information of the target internet product can be uniformly obtained from the preset storage engine, and when there are a plurality of target internet products, direct docking with each target internet product is not required, thereby reducing development cost and time cost.
The preset storage engine may include a database and an LFS. For convenience of processing, different types of product information may be stored in different locations in the predetermined storage engine. For example, in some embodiment modes, the text type product information may be stored in a database of the preset storage engine, and the non-text type product information may be stored in an LFS of the preset storage engine.
In some embodiments, step S21 includes: and acquiring the product information of the target internet product from a preset storage engine according to the preset frequency based on the identification information of the target internet product.
The preset frequency can be flexibly set according to actual needs, for example, once every 3 minutes. That is to say, in the embodiment of the present invention, the product information of the target internet product may be obtained from the preset storage engine at regular time for detection.
Considering that the product information of the target internet product has been detected at the last detection, in step S21, the updated product information of the target internet product may be obtained from the preset storage engine according to the preset frequency based on the identification information of the target internet product. Wherein the product information update includes but is not limited to: product information changes, new product information is added, and the like. Namely, the preset storage engine is accessed regularly to obtain the updated product information of the target internet product, so that the data processing amount is reduced, and the detection speed is increased. For example, assuming that the preset frequency is once every 2 minutes and the last time of accessing the preset storage engine is 12:00:00, the preset storage engine can be accessed at 12:02:00 to obtain the target internet product of 12:00:00-12: 02:00, updated product information.
In some embodiments, step S21 includes:
step 1: and receiving a product information updating notice of the target internet product.
After the product information of the internet product is updated, a product information update notification may be sent to the offending word detection device.
The product information update notification will typically include identification information for the internet product to determine which internet product's product information is updated. After receiving the product information update notification, determining whether the product information update notification is a product information update notification for the target internet product based on the identification information of the target internet product and the internet product identification information in the product information update notification.
Step 2: and according to the product information updating notice, obtaining the product information of the target internet product from a preset storage engine.
And after receiving the product information updating notice of the target internet product, acquiring the product information of the target internet product from a preset storage engine. That is, only after the product information of the target internet product is updated, the product information of the target internet product is acquired. For example, assume that the target internet product includes: the internet products 1, 2 and 3 receive the product information updating notification of the internet product 1 at a certain moment, and then acquire the product information of the internet product 1 from the preset storage engine based on the configuration information of the internet product 1.
Step S22: and detecting whether preset violation words exist in the product information of the target internet product or not based on a preset search engine.
It should be understood that the search engine is a retrieval technology that retrieves the formulated information from the internet by using a specific strategy and feeds the information back to the user according to the user's needs and a certain algorithm. The search engine relies on various technologies, such as a web crawler technology, a retrieval sorting technology, a web page processing technology, a big data processing technology, a natural language processing technology and the like, and provides quick and high-relevance information service for information retrieval users. The core modules of the search engine technology generally comprise crawlers, indexing, retrieving, sorting and the like, and a series of other auxiliary modules can be added to create a better network use environment for users.
In the embodiment of the present invention, the preset search engine may be flexibly set according to actual needs, wherein the preset search engine includes but is not limited to es (elastic search). It will be appreciated that the Elasticsearch is a Lucene-based search server that provides a distributed multi-user capable full-text search engine, based on the RESTful web interface. The Elasticsearch was developed in the Java language and published as open source under the Apache licensing terms, a popular enterprise level search engine. The Elasticisearch is used in cloud computing, can achieve real-time searching, and is stable, reliable, rapid, convenient to install and use.
Presetting the violation words as preset words, including but not limited to: national level, world level, top level, best, first, unique, first, best, accurate, top level, lowest, cheapest, greatest, national level product, filling domestic gaps, absolute, exclusive, first, newest, most advanced, first brand, gold medal, famous brand, most earned, super earned, first, huge star, luxury, extreme, top level enjoyment, warranty interest, no risk, etc.
In the embodiment of the invention, after the product information of the target internet product is obtained, whether the preset violation words exist in the product information of the target internet product is detected based on the preset search engine. For example, assuming that the search engine ES is preset, in order to increase the detection speed, step S22 includes: and storing the product information of each target internet product in the ES, and detecting whether preset violation words exist in the product information of the target internet product based on the ES.
Step S23: and if the preset violation words exist in the product information of the target internet product, acquiring target contact information corresponding to the target internet product.
Wherein the contact information includes but is not limited to: the mailbox, the telephone number and the like can be reported in a mailbox and short message mode so as to modify the violation information.
In the embodiment of the invention, the corresponding contact information is configured for the target internet product, namely, the mapping relation between the target internet product and the contact information is preset. When a preset violation word exists in the product information of a certain target internet product, acquiring contact information corresponding to the target internet product, namely target contact information, according to the mapping relation between the preset target internet product and the contact information.
Step S24: and reporting the product information of the target internet product based on the target contact information.
In the embodiment of the invention, after the target contact information is acquired, the product information of the target internet product is reported based on the target contact information, that is, the product information of the internet product with the illegal word is reported to the target contact, so that the target contact modifies the product information with the illegal word. For example, assuming that the target contact information is the mailbox of the target contact, the product information of the target internet product is reported to the target contact by means of the mail.
In some embodiments, step S24 includes: generating alarm information; and sending the alarm information to the target contact person based on the target contact person information. Wherein, the alarm information may include: at least one of identification information of the target internet product with the preset violation words, the violation words existing in the target internet product, the positions of the violation words in the target internet product and the like is included, so that follow-up tracking and correction are facilitated. For example, the alarm information may be: and the illegal word 'unique' exists in the product introduction on the main interface of the application program B, so that the follow-up tracking and correction are facilitated.
According to the illegal word detection method provided by the embodiment of the invention, product information of a plurality of target internet products is acquired from a preset storage engine based on the identification information of the target internet products; detecting whether preset violation words exist in product information of a target internet product or not based on a preset search engine; if the preset violation words exist in the product information of the target internet product, acquiring target contact information corresponding to the target internet product; based on the target contact information, reporting the product information of the target internet product, thereby quickly and efficiently detecting the product information of the internet product; the product information of a plurality of internet products in the preset storage engine can be detected, a set of detection program does not need to be developed for each internet product, and development cost and later-stage operation and maintenance cost are reduced; meanwhile, product information of the internet products is uniformly acquired from a preset storage engine, direct butt joint with each internet product is not needed, and development cost and time cost are reduced; and if a new internet product exists, the product information of the internet product can be detected only by setting the new internet product as a target internet product without additionally developing a set of detection program, the flow is simple, and the cost is saved.
Based on the first embodiment, a second embodiment of the illegal word detection method is provided. Referring to fig. 3, fig. 3 is a flowchart illustrating a method for detecting an illegal word according to a second embodiment of the present invention. In the embodiment of the present invention, before step S24, the method for detecting an illegal word may further include the following steps:
step S25: a product setup instruction is received.
The product setting instruction comprises Internet product identification information and corresponding contact person information. The internet product identification information is used to identify an internet product, which may be a name of the internet product or the like. The contact information may be at least one of mailbox, telephone number, WeChat name, etc. In some embodiments, the product setup instructions may be generated based on user operations.
In one example, assume that the product setup instructions include: application C and contact mailbox D.
Step S26: and setting the internet product corresponding to the internet product identification information as a target internet product.
After a product setting instruction is received, the Internet product identification information is analyzed from the product setting instruction, and the Internet product corresponding to the Internet product identification information is set as a target Internet product, so that the product information of the Internet product can be detected subsequently.
In the previous example, the application program C is set as a target internet product, so that the product information of the application program C can be detected subsequently.
Step S27: and determining configuration information of the target internet product based on the product setting instruction.
And determining the configuration information of the target internet product according to the internet product identification information and the contact information in the product setting instruction, and storing the configuration information. It should be noted that the configuration information of the target internet product includes: identification information of the target internet product and corresponding contact information.
Step S23 includes: and if the preset violation words exist in the product information of the target internet product, acquiring the contact information of the target internet product from the configuration information of the target internet product to serve as the target contact information corresponding to the target internet product.
It should be noted that, in the embodiment of the present invention, the configuration information of the target internet product may be newly added or replaced according to the setting of the user. In some embodiment modes, in order to increase the obtaining speed, the configuration information of the target internet product may further include a product information storage address, where the product information storage address is a storage address of product information of the internet product in a preset storage engine, so that in step S21, the product information storage address may be obtained from corresponding configuration information based on the identification information of the target internet product, and the product information of the target internet product may be obtained from the preset storage engine according to the product information storage address.
Considering that the text-type product information is stored in a database, the database typically stores data in the form of tables. Therefore, the product information storage address may include a table name, where the table name is a name of a linked list storing text product information of the target internet product in a database of the preset storage engine; in this way, in step S21, the text product information of the target internet product may be obtained from the database of the preset storage engine according to the table name in the configuration information of the target internet product. In order to more accurately and quickly acquire the text product information of the internet product, in an example, the product information acquisition address further includes: a field name, the field name being: presetting a name of a field of text product information of a target internet product in a database of a storage engine; in this way, in step S21, the text product information of the target internet product may be obtained from the database of the preset storage engine according to the field name in the configuration information.
The illegal word detection method provided by the embodiment of the invention receives a product setting instruction, wherein the product setting instruction comprises identification information and contact information of an internet product, so that a target internet product and configuration information are determined according to the product setting instruction, the product information of the target internet product is detected subsequently, and after detection, if the product information contains a preset illegal word, the illegal word is reported according to the contact information in the configuration information.
Based on the foregoing embodiments, a third embodiment of the illegal word detection method according to the present invention is provided. Referring to fig. 4, fig. 4 is a flowchart illustrating a method for detecting an illegal word according to a third embodiment of the present invention. In the embodiment of the invention, the product information of the internet product comprises text product information.
Step S21 includes: and acquiring the text product information of the target internet product from a database of a preset storage engine based on the identification information of the target internet product.
In the embodiment of the invention, the preset storage engine comprises a database, and the text product information of the target internet products is stored in the database of the preset storage engine, so that the text product information of a plurality of target internet products is obtained from the database of the preset storage engine.
Wherein, the database of the preset storage engine may be a hive database.
In some embodiments, the text-based product information of the target internet product may be obtained from a database of a preset storage engine according to a preset frequency based on the identification information of the target internet product. Namely, the database of the preset storage engine is periodically accessed to acquire the text product information of the target internet product. In one example, updated text product information of a plurality of target internet products can be obtained from a database of a preset storage engine according to a preset frequency, so that data processing amount is reduced, and detection speed is increased.
Step S22 includes: and detecting whether preset violation words exist in the text product information of the target internet product based on a preset search engine.
Step S23 includes: and if the preset violation words exist in the text product information of the target internet product, acquiring target contact information corresponding to the target internet product.
After step S22, the illegal word detection method may further include the steps of:
step S28: and if the preset violation words exist in the text product information of the target Internet product, storing the text product information of the target Internet product.
And if the preset violation words exist in the text product information of the target Internet product, storing the text product information of the target Internet product to make data storage for subsequent operation.
In one example, the preset search engine includes an ES, and whether a preset violation word exists in the text product information of the target internet product may be detected based on the ES. In another example, the preset search engine includes a search engine other than the ES, and whether a preset violation word exists in the text product information of each target internet product may be detected based on the search engine other than the ES; if the preset violation words exist in the text product information of the target Internet product, the text product information with the violation words is stored in the ES, so that the data storage space of the ES is saved.
In the method for detecting the illegal word, the text product information of the target internet product is acquired from a database of a preset storage engine for the text product information, and whether the preset illegal word exists in the text product information of the target internet product is detected based on a preset search engine; and if the preset violation words exist in the text product information of the target Internet product, storing the text product information of the target Internet product, thereby reducing the data storage capacity.
Based on the foregoing embodiments, a fourth embodiment of the illegal word detection method according to the present invention is provided. In the embodiment of the invention, the product information of the internet product comprises non-text product information.
Step S21 includes: and acquiring non-text product information of the target internet product from an LFS (Linear feedback System) of a preset storage engine based on the identification information of the target internet product.
In the embodiment of the invention, the preset storage engine comprises an LFS, and the non-text product information of the target Internet product is stored in the LFS of the preset storage engine, so that the non-text product information of the target Internet product is acquired from the LFS of the preset storage engine based on the identification information of the target Internet product.
In some embodiments, the non-text product information of the target internet product may be obtained from the LFS of the preset storage engine according to a preset frequency based on the identification information of the target internet product. Namely, the LFS of the preset storage engine is periodically accessed to acquire the non-text product information of the target internet product. In one example, the updated non-text product information of the target internet product can be obtained from the LFS of the preset storage engine according to the preset frequency, so that the data processing amount is reduced, and the detection speed is increased.
In some embodiments, a product information update notification may be received, and whether the target internet product is a product update notification of the target internet product may be determined based on the identification information of the target internet product, and if so, the non-text product information of the target internet product may be obtained from the LFS of the preset storage engine according to the product information update notification. It should be noted that, after the data stored in the LFS is updated, the LFS sends a product information update notification to the internet product detection device, and after receiving the product information update notification, the internet product detection device determines whether the product information update notification belongs to the target internet product, and if so, obtains the product information of the target internet product from the LFS of the preset storage engine according to the product information update notification.
Step S22 includes: and detecting whether preset violation words exist in the non-text product information of the target internet product based on a preset search engine.
After the non-text product information of the target internet product is obtained, whether a preset violation word exists in the non-text product information of the target internet product is detected based on a preset search engine.
In some embodiments, since the non-text type product information is not in text form, step S22 may include the following steps:
step S221: and carrying out character recognition on the non-text product information of the target Internet product to obtain the text content of the target Internet product.
The non-text product information of the target internet product can be subjected to Character Recognition through an Optical Character Recognition (OCR) technology to obtain text content. Of course, the non-text product information of the target internet product can be subjected to character recognition through other technologies to obtain text content.
Step S222: and detecting whether preset violation words exist in the text content of the target Internet product based on a preset search engine.
The preset search engine may include an ES, that is, based on the ES, it is detected whether a preset violation word exists in the text content of the target internet product. In some embodiments, the textual content of the target internet product may be stored in a database of the ES; searching a preset violation word in a storage area of the text content of the target Internet product based on the ES; if the preset violation words are searched in the storage area, judging that the preset violation words exist in the text content of the target internet product; if the preset violation word is not searched in the storage area, judging that the preset violation word does not exist in the text content of the target internet product. For example, suppose that the text content of a first target internet product is stored in a storage area 1 in an ES database, the text content of a second target internet product is stored in a storage area 2 in the ES database, preset violating words are searched in the storage areas 1 and 2, respectively, and if the preset violating words are searched in the storage area 1, the preset violating words exist in the text content of the first target internet product; if the preset violation word is not searched in the storage area 2, the preset violation word does not exist in the text content of the second target internet product.
In some embodiments, the preset search engine may include two search engines, one search engine is an ES, and the other search engine is a search engine except the ES, wherein the text content of the target internet product is stored in the ES, and whether the text content of the target internet product has a preset keyword is detected based on the ES; and detecting whether preset keywords exist in the text product information of the target Internet product or not based on a search engine except the ES, and if so, storing the text product information of the target Internet product in the ES.
In consideration of the possibility of errors in the text content obtained by performing character recognition on the non-text product information, in some embodiments, before step S222, the following steps may be further included: and checking the text content of the target Internet product.
Step S23 includes: and if the preset violation words exist in the non-text product information of the target internet product, acquiring target contact information corresponding to the target internet product.
In the method for detecting the illegal word, provided by the embodiment of the invention, for the information of the non-text product, the information of the non-text product of the target internet product is obtained from an LFS (Linear feedback System) of a preset storage engine, character recognition is carried out on the information of the non-text product of the target internet product to obtain text content, and whether the preset illegal word exists in the text content of the target internet product is detected based on a preset search engine; and if the preset violation words exist in the non-text product information of the target internet product, acquiring target contact information corresponding to the target internet product, and reporting the product information of the target internet product based on the target contact information, thereby realizing the detection of the non-text product information.
Illegal word detection device embodiment:
referring to fig. 5, fig. 5 is a block diagram illustrating a structure of a first embodiment of the illegal word detection device according to the present invention, where the illegal word detection device includes:
the first obtaining module 51 is configured to obtain product information of a target internet product from a preset storage engine based on identification information of the target internet product, where the preset storage engine includes product information of a plurality of internet products.
The detection module 52 is configured to detect whether a preset violation word exists in the product information of the target internet product based on a preset search engine.
The second obtaining module 53 is configured to obtain target contact information corresponding to the target internet product if a preset violation word exists in the product information of the target internet product.
And a reporting module 54, configured to report the product information of the target internet product based on the target contact information.
It should be noted that the illegal word detection device may further optionally include a corresponding module to implement other steps of the illegal word detection method.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for causing a terminal device to execute the method according to the embodiments of the present invention.
The above description is only an alternative embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for detecting illegal words is characterized by comprising the following steps:
based on identification information of a target internet product, acquiring product information of the target internet product from a preset storage engine, wherein the preset storage engine comprises product information of a plurality of internet products;
detecting whether preset violation words exist in the product information of the target internet product or not based on a preset search engine;
if the preset violation word exists in the product information of the target internet product, acquiring target contact information corresponding to the target internet product;
and reporting the product information of the target internet product based on the target contact information.
2. The method for detecting offending words in claim 1 wherein the product information includes text-type product information;
the step of obtaining the product information of the target internet product from a preset storage engine based on the identification information of the target internet product comprises the following steps:
and acquiring the text product information of the target internet product from a database of a preset storage engine based on the identification information of the target internet product.
3. The illegal word detection method according to claim 2, wherein after the step of detecting whether a preset illegal word exists in the product information of the target internet product based on a preset search engine, the illegal word detection method further comprises the steps of:
and if the preset violation word exists in the text product information of the target internet product, storing the text product information of the target internet product.
4. The method for detecting offending words in claim 1 wherein the product information includes non-textual product information;
the step of obtaining the product information of the target internet product from a preset storage engine based on the identification information of the target internet product comprises the following steps:
and acquiring non-text product information of the target internet product from an LFS log structure file system of a preset storage engine based on the identification information of the target internet product.
5. The illegal word detection method according to claim 4, wherein the step of detecting whether a preset illegal word exists in the product information of the target internet product based on a preset search engine comprises:
performing character recognition on the non-text product information of the target internet product to obtain the text content of the target internet product;
and detecting whether preset violation words exist in the text content of the target Internet product based on a preset search engine.
6. The method for detecting illegal words according to claim 5, wherein the step of performing character recognition on the non-text product information of the target internet product to obtain the text content of the target internet product comprises:
performing character recognition on the non-text product information of the target internet product through an Optical Character Recognition (OCR) technology to obtain text content of the target internet product;
the step of detecting whether a preset violation word exists in the text content of the target internet product based on a preset search engine comprises the following steps:
storing the text content of the target Internet product into a database of a search server ES;
searching the preset violation words in a storage area of the text content of the target Internet product based on the search server ES;
if the preset violation word is searched in the storage area, judging that the preset violation word exists in the text content of the target internet product;
and if the preset violation word is not searched in the storage area, judging that the preset violation word does not exist in the text content of the target internet product.
7. The illegal word detection method according to any one of claims 1-6, wherein before the step of obtaining the product information of the target internet product from a preset storage engine based on the identification information of the target internet product, the illegal word detection method further comprises the steps of:
receiving a product setting instruction, wherein the product setting instruction comprises Internet product identification information and corresponding contact information;
setting the internet product corresponding to the internet product identification information as a target internet product so as to detect the product information of the internet product;
determining configuration information of the target internet product based on the product setting instruction, wherein the configuration information comprises identification information of the target internet product and corresponding contact information;
if the preset violation word exists in the product information of the target internet product, the step of obtaining target contact information corresponding to the target internet product includes:
and if the preset violation word exists in the product information of the target internet product, acquiring contact information of the target internet product from the configuration information of the target internet product to serve as target contact information corresponding to the target internet product.
8. An illegal word detection device, characterized in that the illegal word detection device comprises:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring product information of a target internet product from a preset storage engine based on identification information of the target internet product, and the preset storage engine comprises the product information of a plurality of internet products;
the detection module is used for detecting whether preset violation words exist in the product information of the target internet product or not based on a preset search engine;
the second obtaining module is used for obtaining target contact person information corresponding to the target internet product if the preset violation word exists in the product information of the target internet product;
and the reporting module is used for reporting the product information of the target internet product based on the target contact information.
9. An illegal word detection device, characterized in that the illegal word detection device comprises: a memory, a processor, and a violating word detection program stored on the memory and running on the processor, which when executed by the processor, implement the steps of the violating word detection method of any of claims 1-7.
10. A computer-readable storage medium, having stored thereon an illegal word detection program which, when executed by a processor, implements the steps of the illegal word detection program method according to any one of claims 1 to 7.
CN202110213648.7A 2021-02-25 2021-02-25 Illegal word detection method, device and equipment and computer-readable storage medium Pending CN113032658A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110213648.7A CN113032658A (en) 2021-02-25 2021-02-25 Illegal word detection method, device and equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110213648.7A CN113032658A (en) 2021-02-25 2021-02-25 Illegal word detection method, device and equipment and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN113032658A true CN113032658A (en) 2021-06-25

Family

ID=76462073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110213648.7A Pending CN113032658A (en) 2021-02-25 2021-02-25 Illegal word detection method, device and equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN113032658A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449506A (en) * 2021-06-29 2021-09-28 未鲲(上海)科技服务有限公司 Data detection method, device and equipment and readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106383862A (en) * 2016-08-31 2017-02-08 杭州云片网络科技有限公司 Violation short message detection method and system
US20180004637A1 (en) * 2016-07-01 2018-01-04 Wipro Limited Method and a system for automatically identifying violations in one or more test cases
CN109508402A (en) * 2018-11-15 2019-03-22 上海指旺信息科技有限公司 Violation term detection method and device
CN109545200A (en) * 2018-10-31 2019-03-29 深圳大普微电子科技有限公司 Edit the method and storage device of voice content
CN110852231A (en) * 2019-11-04 2020-02-28 云目未来科技(北京)有限公司 Illegal video detection method and device and storage medium
CN111355781A (en) * 2020-02-18 2020-06-30 腾讯科技(深圳)有限公司 Voice information communication management method, device and storage medium
JP6779405B1 (en) * 2020-06-23 2020-11-04 株式会社Ipsign Infringement information extraction systems, methods and programs
CN112131507A (en) * 2020-09-25 2020-12-25 成都知道创宇信息技术有限公司 Website content processing method, device, server and computer-readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180004637A1 (en) * 2016-07-01 2018-01-04 Wipro Limited Method and a system for automatically identifying violations in one or more test cases
CN106383862A (en) * 2016-08-31 2017-02-08 杭州云片网络科技有限公司 Violation short message detection method and system
CN109545200A (en) * 2018-10-31 2019-03-29 深圳大普微电子科技有限公司 Edit the method and storage device of voice content
CN109508402A (en) * 2018-11-15 2019-03-22 上海指旺信息科技有限公司 Violation term detection method and device
CN110852231A (en) * 2019-11-04 2020-02-28 云目未来科技(北京)有限公司 Illegal video detection method and device and storage medium
CN111355781A (en) * 2020-02-18 2020-06-30 腾讯科技(深圳)有限公司 Voice information communication management method, device and storage medium
JP6779405B1 (en) * 2020-06-23 2020-11-04 株式会社Ipsign Infringement information extraction systems, methods and programs
CN112131507A (en) * 2020-09-25 2020-12-25 成都知道创宇信息技术有限公司 Website content processing method, device, server and computer-readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449506A (en) * 2021-06-29 2021-09-28 未鲲(上海)科技服务有限公司 Data detection method, device and equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US20160275203A1 (en) Instant search results with page previews
US20220066860A1 (en) System for resolution of technical issues using computing system-specific contextual data
CN112685578B (en) Method and device for providing multimedia information content
CN105718533A (en) Information pushing method and device
CN113079123B (en) Malicious website detection method and device and electronic equipment
CN106919571A (en) Obtain the method and device of the picture matched with search keyword
CN109241031B (en) Model generation method, model using method, device, system and storage medium
CN112910925B (en) Domain name detection method, model training method and device, equipment and storage medium
CN109388551A (en) There are the method for loophole probability, leak detection method, relevant apparatus for prediction code
CN106326734A (en) Method and device for detecting sensitive information
CN103198066A (en) Word list based information search method and search system
CN103412900A (en) File downloading treatment method and terminal
CN110196833A (en) Searching method, device, terminal and the storage medium of application program
CN113032658A (en) Illegal word detection method, device and equipment and computer-readable storage medium
US20140373033A1 (en) Electronic device and method for launching an application installed in the same through address information
CN108491502B (en) News tracking method, terminal, server and storage medium
US10387545B2 (en) Processing page
US20210294776A1 (en) System for natural language processing-based electronic file scanning for processing database queries
CN112925878B (en) Data processing method and device
CN110866114B (en) Object behavior identification method and device and terminal equipment
CN114398993B (en) Search information recall method, system, device and medium based on tag data
US10282482B2 (en) Data provision device, data provision method, and data provision program
CN114416600B (en) Application detection method and device, computer equipment and storage medium
CN105373596A (en) Mobile terminal based on user interest mining and user interest mining method
CN112445907B (en) Text emotion classification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination