CN109726312B - Regular expression detection method, device, equipment and storage medium - Google Patents

Regular expression detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN109726312B
CN109726312B CN201811594357.1A CN201811594357A CN109726312B CN 109726312 B CN109726312 B CN 109726312B CN 201811594357 A CN201811594357 A CN 201811594357A CN 109726312 B CN109726312 B CN 109726312B
Authority
CN
China
Prior art keywords
regular expression
text data
matching
determining
matched
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811594357.1A
Other languages
Chinese (zh)
Other versions
CN109726312A (en
Inventor
胡陆杰
黄洁斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Information Technology Co Ltd
Original Assignee
Guangzhou Huya Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Information Technology Co Ltd filed Critical Guangzhou Huya Information Technology Co Ltd
Priority to CN201811594357.1A priority Critical patent/CN109726312B/en
Publication of CN109726312A publication Critical patent/CN109726312A/en
Application granted granted Critical
Publication of CN109726312B publication Critical patent/CN109726312B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the invention discloses a regular expression detection method, a regular expression detection device, regular expression detection equipment and a storage medium. The method comprises the steps of determining a regular expression set for a video service; acquiring text data generated based on the video service; matching the text data by using the regular expression; when the matched text data meets the preset scale condition, the correctness of the regular expression is determined according to the matching success rate of the regular expression, the problem of business errors caused by writing the regular expression by business personnel by mistake is solved, the regular expression is screened, and the risk of using the regular expression is reduced.

Description

Regular expression detection method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of text mining, in particular to a regular expression detection method, a device, equipment and a storage medium.
Background
In the text filtering and auditing service, the regular expression plays a very important role, for example, in the field of live broadcast or video, the regular expression is used for filtering the barrage, and then the filtered barrage is displayed on a video playing interface.
However, due to the fact that the writing difficulty is large, the writing complexity is complex, the readability is poor, it is difficult to visually judge whether the content is correct or not, and a business accident can be caused by slight carelessness, for example, all the barrages are filtered due to the fact that the regular expression is matched with all the barrages due to the writing error, and then the live broadcast platform or the video cannot display the barrages sent by the user on a video playing interface. Regular expressions are also associated with a great risk.
Disclosure of Invention
The invention provides a regular expression detection method, a regular expression detection device, regular expression detection equipment and a regular expression detection storage medium, which are used for screening regular expressions so as to reduce the risk of using the regular expressions.
In a first aspect, an embodiment of the present invention provides a regular expression detection method, where the method includes:
determining a regular expression set for the video service;
acquiring text data generated based on the video service;
matching the text data by using the regular expression;
and when the matched text data meets a preset scale condition, determining the correctness of the regular expression according to the matching success rate of the regular expression.
Further, when the text data meets a preset scale condition, determining the correctness of the regular expression according to the matching success rate of the regular expression, including:
if the matching duration of the matched text data exceeds a preset duration and the total number of the matched text data exceeds a preset number, determining that the matched text data meets a preset scale condition;
determining the matching success rate of the regular expression;
judging whether the matching success rate is higher than a preset proportion or not;
if yes, determining that the regular expression is wrong;
if not, determining that the regular expression is correct.
Further, after determining the correctness of the regular expression according to the matching success rate of the regular expression, the method includes:
stopping matching the text data by using the regular expression when the regular expression is determined to be wrong;
when the regular expression is determined to be correct, keeping matching the text data by using the regular expression.
Further, after stopping matching the text data using the regular expression when it is determined that the regular expression is erroneous, the method further includes:
receiving a manual check operation for the regular expression;
and correcting the correctness of the regular expression according to the manual checking operation.
Further, receiving a manual check operation for the regular expression, including:
displaying the regular expression;
displaying the text data successfully matched with the regular expression;
receiving a manual verification operation.
Further, the text data generated by the video service is the text data modified or released by the user;
after the matching the text data by using the regular expression, the method further comprises:
forbidding to modify or publish the text data successfully matched with the regular expression;
allowing modification or publication of the text data that failed to match the regular expression.
Further, the text data includes at least one of: bullet screen data, user nickname data and user signature data.
In a second aspect, an embodiment of the present invention further provides a regular expression detection apparatus, where the apparatus includes:
the regular expression determining module is used for determining a regular expression set for the video service;
the text data acquisition module is used for acquiring text data generated based on the video service;
the matching module is used for matching the text data by using the regular expression;
and the correctness determining module is used for determining the correctness of the regular expression according to the matching success rate of the regular expression when the matched text data meets the preset scale condition.
In a third aspect, an embodiment of the present invention further provides a regular expression detection device, where the device includes: a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the regular expression detection method as in any of the first aspects.
In a fourth aspect, embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the regular expression detection method according to any one of the first aspect.
The embodiment of the invention determines the regular expression set for the video service; acquiring text data generated based on the video service; matching the text data by using the regular expression; when the matched text data meets the preset scale condition, the correctness of the regular expression is determined according to the matching success rate of the regular expression, and the problems that business personnel can easily wrongly write the regular expression due to high difficulty and complexity of writing the regular expression are solved, so that business errors are avoided, the problem that the correctness cannot be visually judged due to poor readability is also solved, regular expressions are automatically screened, the risk of using the regular expression is reduced, and the cost caused by business errors is reduced.
Drawings
Fig. 1 is a flowchart of a regular expression detection method according to an embodiment of the present invention;
fig. 2A is a flowchart of a regular expression detection method according to a second embodiment of the present invention;
fig. 2B is a schematic diagram of a regular expression detection method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a regular expression detection apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a regular expression detection device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a regular expression detection method according to an embodiment of the present invention, which is applicable to detecting correctness of a regular expression, and specifically, may be an application scenario in which a regular expression is used to filter text data generated by a video service.
Further, the method may be executed by regular expression detection equipment, which is not limited in this embodiment and may be a server, a computer, or the like. In this embodiment, the regular expression detection device is exemplified as a server, and the server may be an independent server or a cluster server; which may be a physical server or a virtual server.
Furthermore, the server can also provide a user interaction interface for facilitating the user to check and manage the flow of regular expression detection.
Referring to fig. 1, the method specifically includes the following steps:
and S110, determining a regular expression set for the video service.
In this embodiment, the video service may be a video website service or a live platform service, where the video website service is a service for sharing video, and a video source in the video website may be an original work uploaded by a video website operator or a user, or a video purchased by the video website and having a video and audio copyright; the live broadcast service is a novel multimedia sharing service, and this embodiment is described in an application scenario in which the video service is a live broadcast service. Specifically, the anchor can upload live broadcast contents recorded in real time to the live broadcast platform; further, the live platform distributes the live content for viewing to viewers who are interested in the anchor or subscribe to the live content. Wherein the live content is delivered in a video stream.
It should be noted that the video service in the live broadcast platform includes, in addition to the main service for live broadcast content distribution, services such as operation and background to ensure normal operation of the live broadcast platform. In particular, in order to ensure the civilized communication environment of the live broadcast platform and the health of the content of the live broadcast platform, the live broadcast data generated by the video service needs to be checked in the video services such as operation, background and the like. The live data includes multimedia data such as video data, text data, voice data, and picture data. For example, the video data may be a video stream uploaded by a main broadcast, the text data may be barrage data sent by a viewer to a live broadcast room, the voice data may be an audio stream in the video stream uploaded by the main broadcast, and the picture data may be an avatar of a user.
In this embodiment, the examination of the text data is taken as an example for explanation, and in the process of examining and verifying the text data, an operator needs to set a regular expression in the background to filter and analyze the text data. The regular expression is a logical formula for operating on a character string, that is, a "regular character string" is formed by using specific characters defined in advance and a combination of the specific characters, and is generally used for retrieving and replacing texts conforming to a certain pattern (rule). The regular expression can be preset by an operator according to the requirement of the video service, and is stored in a storage center of the regular expression, and is extracted when the regular expression needs to be used.
And S120, acquiring text data generated based on the video service.
The video service is not limited in this embodiment, and any video service that involves generating text data and requires the use of a regular expression may be applicable. Illustratively, the video service may include: bullet screen service, user nickname setting service and user signature service. Correspondingly, the text data comprises at least one of the following: bullet screen data, user nickname data and user signature data. In an embodiment, the video service is a barrage service, and the audience sends barrage data to a background server of a live broadcast platform according to the watched live broadcast content through a client provided by the live broadcast platform. And the live broadcast platform distributes the received barrage data to the client corresponding to the live broadcast content for display.
In an embodiment, the video service sets a service for a user nickname, and a user of the live platform can set the user nickname through a client provided by the live platform. And after receiving the nickname of the user, the live broadcast platform performs service processing corresponding to the nickname setting service of the user, and if the nickname uniqueness of the user is judged, namely after the nickname of the user is judged to be registered, the live broadcast platform reminds the user to reset the nickname of the user.
In an embodiment, the video service is a user signature service, and a user of the live platform may set a user signature through a client provided by the live platform, where the user signature may be a profile or description of the user.
And S130, matching the text data by using the regular expression.
This embodiment will be described by taking an example of checking text data using a regular expression. The process of auditing is the process of matching. In particular, the regular expression may be used to retrieve text data that conforms to a certain pattern (rule). For example, the pattern (rule) may be used to match sensitive words and phrases, thereby determining that the successfully matched text data contains sensitive words and phrases that are not suitable for distribution to the live platform. The sensitive words in this embodiment may include non-civilized words. Further, the successfully matched text data can be intercepted to maintain a civilized communication environment of a live broadcast room.
S140, when the matched text data meets a preset scale condition, determining the correctness of the regular expression according to the matching success rate of the regular expression.
In this embodiment, different regular expressions are set for different video services to complete filtering processing of text data. The correctness of the regular expression means that the regular expression can meet the service requirement preset by an operator.
In this embodiment, the preset scale condition is used to ensure the scale of the text data, and the scale may include specifying the data size of the text data and the time period for generating the text data. Of course, if the text data is offline data, the size may be simply defined from the data size of the text data. The matched text data is ensured to meet the preset scale condition, and the condition that the detection result is in error due to insufficient scale of the text data can be avoided, so that the detection accuracy and reliability are improved, and unnecessary service loss caused by false detection of the video service is avoided.
Further, the present embodiment describes the determination of the correctness of the regular expression according to the matching success rate of the regular expression in detail by taking the bullet screen data as an example. In an embodiment, regular expressions can be used for matching the bullet screen data, and in general, the successfully matched bullet screen data can be determined to be bullet screen data containing sensitive words and sentences, and the successfully matched bullet screen data is intercepted, that is, the successfully matched bullet screen data is not sent to a playing interface of live broadcast content for displaying.
It should be noted that, if all the barrage data sent by the user can be successfully matched due to an operation writing error of the regular expression, that is, all the barrages are filtered out, and then the live broadcast platform cannot display the barrages sent by the user on a play interface of the live broadcast content.
Further, in order to reduce the huge risk of the regular expression caused by the writing error, the present embodiment determines the correctness of the regular expression according to the matching success rate of the regular expression.
In the technical scheme of the embodiment, the regular expression set for the video service is determined; acquiring text data generated based on the video service; matching the text data by using the regular expression; when the matched text data meets the preset scale condition, the correctness of the regular expression is determined according to the matching success rate of the regular expression, and the problem that business personnel can easily and mistakenly write the regular expression due to high difficulty and complexity of writing the regular expression is solved, so that business accidents are avoided, the problem that the correctness cannot be visually judged due to poor readability is also solved, regular expressions are automatically screened, the risk of using the regular expression is reduced, and the cost caused by the business accidents is reduced. Furthermore, the premise that the text data meet the preset scale condition is increased, the reliability and accuracy of the matching success rate calculation are ensured, and the condition that the detection result is in error due to insufficient scale of the text data is avoided.
Example two
Fig. 2A is a flowchart of a regular expression detection method according to a second embodiment of the present invention, and fig. 2B is a schematic diagram of a regular expression detection method according to a second embodiment of the present invention.
Referring to fig. 2A, the present embodiment is further detailed on the basis of the above embodiment, and the method specifically includes the following steps:
s210, determining a regular expression set for the video service.
In this embodiment, the video service may be a video website service or a live platform service, and this embodiment takes the example that the video service is a live service as an example for explanation.
And S220, acquiring text data generated based on the video service.
The text data described in the above embodiments may be received by a text receiving service operated by a live broadcast platform, so as to further perform filtering and analysis processing. The text data may be stored in a data storage center or processed directly in real time and deleted after processing.
In one embodiment, referring to fig. 2B, the text data is received by a text receiving service running on the live broadcast platform, stored in a data storage center, and extracted when machine review is required. The regular expressions are added by operators, stored in a regular storage center and extracted when machine auditing is needed. The process of machine review is the regular expression detection process executed in steps S210-S280 in this embodiment.
And S230, matching the text data by using the regular expression.
In this embodiment, when the text data generated by the text receiving service receiving the video service is stored in the data storage center, the text data may also carry a timestamp. And determining the time sequence of the text data according to the time stamp, wherein the regular expression only matches the text data generated by the video service after the regular expression starts to be used so as to ensure the real-time property and also can determine whether the regular expression is more suitable for the text data at the present stage.
Further, in this embodiment, the text data generated by the video service is text data modified or issued by a user; after the matching the text data by using the regular expression, the method further comprises: forbidding to modify or publish the text data successfully matched with the regular expression; allowing modification or publication of the text data that failed to match the regular expression. In this embodiment, the text data is not limited, and the text data includes at least one of the following: bullet screen data, user nickname data and user signature data.
In an embodiment, text data is taken as bullet screen data for example. When the video service is a bullet screen service, the bullet screen data sent to the live broadcast room by the user needs to be filtered, and the filtering process can be used for filtering out popular comments, sensitive words and sentences and the like. Furthermore, the bullet screen data obtained by filtering can be analyzed, and the bullet screen data can be used for analyzing the user portrait of the anchor, the quality of the live broadcast content and the like. In this embodiment, the filtration of the bullet screen data containing sensitive words and sentences is taken as an example to explain, and the release of the text data successfully matched with the regular expression is prohibited, that is, the bullet screen data successfully matched with the regular expression is intercepted and is prohibited from being released to the live broadcast room, so as to maintain the civilized communication environment of the live broadcast room.
In one embodiment, the text data is taken as the user nickname data for example. When the video service is a user nickname setting service, user nickname data generated when the user sets a nickname needs to be filtered, and sensitive words and sentences which do not accord with the regulation in the nickname are prevented from influencing the cultural environment of a live broadcast room. Further, the regular expression is used for matching the user nickname data containing the sensitive words and sentences, and forbidding the user nickname data which is successfully matched from being modified. Similarly, the same filtering process is performed on the user signature data, and the technical effect corresponding to the other text data can be obtained.
S240, if the matching duration of the matched text data exceeds the preset duration and the total number of the matched text data exceeds the preset number, determining that the matched text data meets the preset scale condition.
In this embodiment, the matching duration is calculated from the time point at which the regular expression is put into use, instead of the time taken to match the text data. Illustratively, the regular expression is put into use on Monday 00:00 on 17 th in 2018, 12 and 18 th in 2018, and the matching duration of the regular expression is 24 hours regardless of whether text data is matched within the 24 hours.
In this embodiment, the total number of the matched text data is also calculated from the time point when the regular expression is put into use. Illustratively, the regular expression is put into use on monday 00:00 on 17 th on 12 th and 12 th in 2018, and by tuesday 18 on 12 th and 12 th in 2018, the total number of text data generated by the text receiving service for receiving the video service is 10000, that is, the total number of matched text data is 10000, and the text data received by the text receiving service before monday 00:00 on 17 th on 12 th and 12 th in 2018 is not considered.
Further, the scale condition is not limited in this embodiment, for example, the preset duration is 8 hours, and the preset number is 10000, when the regular expression is put into use for matching the text data on 17 th monday 00:00 in 12 th and 12 th in 2018, and until 18 th monday in 12 th and 18 th in 2018, the matching duration of the regular expression is 24 hours, and the total number of the text data generated by the text receiving service receiving the video service is 10000, it is determined that the matched text data meets the preset scale condition. It should be noted that, in this embodiment, at the time when the matching duration of the regular expression is 8 hours, since the total number of text data does not exceed the preset number (10000 pieces), the matched text data is not considered to satisfy the preset scale condition.
In the embodiment, the text data is determined to meet the preset scale condition by analyzing the matching duration and the total number of the text data, so that the reliability and the accuracy of the calculation of the matching success rate are ensured, and the condition that the detection result is in error due to insufficient scale of the text data is avoided.
And S250, determining the matching success rate of the regular expression.
In one embodiment, the text data successfully matched with the regular expression is used as target text data, and the ratio of the number of the target text data to the total number of the text data is used as the matching success rate of the regular expression.
And S260, judging whether the matching success rate is higher than a preset ratio.
In this embodiment, if yes, S270 is executed; if not, go to step S280.
In this embodiment, the preset proportion may be determined by analyzing historical data of the live broadcast platform, and may be 5%. For example, the text data is bullet screen data, and when the matching success rate of the regular expression is higher than a preset ratio, it may be determined that the regular expression is wrong, which may cause intercepting most or all of bullet screen data.
S270, determining that the regular expression is wrong.
In this embodiment, when it is determined that the regular expression is incorrect, the regular expression is stopped from being used to match the text data, so as to avoid a business accident caused by continued use of the incorrect regular expression, and ensure business safety. In addition, because the number of text data generated by the video service is large in the live broadcast platform, the machine audit is performed by using the server to automatically count the matching success rate, so that the labor cost increased by manual audit can be reduced, the condition of false detection caused by manual audit fatigue is avoided, and the whole regular expression detection process has high efficiency and reliability.
In one embodiment, referring to fig. 2B, on the basis of machine review, the present embodiment further provides a step of manual review. Specifically, after stopping matching the text data using the regular expression when it is determined that the regular expression is incorrect, the method further includes: receiving a manual check operation for the regular expression; and correcting the correctness of the regular expression according to the manual checking operation. In the embodiment, the manual auditing process is added, so that the condition that the machine is audited to be missed or mistakenly checked can be avoided, and the whole regular expression detecting process is further increased, so that the efficiency and the reliability are high.
Further, the manual checking operation for the regular expression includes: displaying the regular expression; displaying the text data successfully matched with the regular expression; receiving a manual verification operation. In this embodiment, the text data successfully matched with the regular expression is displayed to assist an operator in manually verifying the correctness of the regular expression. Of course, the text data matched with the regular expression may also be displayed.
And S280, determining that the regular expression is correct.
In this embodiment, when it is determined that the regular expression is correct, matching the text data using the regular expression is maintained, or detection of the regular expression is stopped, so as to save computing resources.
In the technical scheme of the embodiment, the regular expression set for the video service is determined; acquiring text data generated based on the video service; matching the text data by using the regular expression; if the matching duration of the matched text data exceeds a preset duration and the total number of the matched text data exceeds a preset number, determining that the matched text data meets a preset scale condition; determining the matching success rate of the regular expression; judging whether the matching success rate is higher than a preset proportion, if so, determining that the regular expression is wrong; if not, the regular expression is determined to be correct, the text data is determined to meet the scale condition through the preset time length and the preset quantity, the correctness of the regular expression is determined through the preset proportion, the problem that business personnel can easily and mistakenly write the regular expression due to high difficulty and complexity of writing the regular expression is solved, accordingly, business accidents are avoided, the problem that the correctness cannot be visually judged due to poor readability is solved, the regular expression is automatically screened, the risk of using the regular expression is reduced, and the cost caused by the business accidents is reduced. Furthermore, the premise that the text data meet the preset scale condition is increased, the reliability and accuracy of the matching success rate calculation are ensured, and the condition that the detection result is in error due to insufficient scale of the text data is avoided. And the manual checking process is further added, so that the condition that the machine is checked to be missed or mistakenly checked can be avoided, and the high efficiency and reliability of the detection process of the whole regular expression are improved.
Further, on the basis of the above embodiment, after step S250, it is determined whether the matching success rate is lower than a preset threshold, and if so, it is determined that the regular expression is invalid; if not, determining that the regular expression is valid. The effectiveness of the regular expression is judged by setting a preset threshold, wherein the effectiveness of the regular expression refers to normal matching when the regular expression is used for matching text data, and when the matching success rate is lower than the preset threshold, the regular expression can be considered invalid and normal matching cannot be carried out.
Further, the preset threshold may be determined according to the actual demand of the video service and historical data. In an embodiment, text data is taken as bullet screen data for example. When the video service is a bullet screen service, the bullet screen data sent to the live broadcast room by the user needs to be filtered, and the filtering process can be used for filtering out popular comments, sensitive words and sentences and the like. If the ratio of the number of the barrage data containing the popular comments and the sensitive words and phrases in the historical data of the video service to the total number of the barrage data is not less than 3%, the preset threshold value can be set to be 3%. If the matching success rate of the regular expression is 1% and is lower than the preset threshold value of 3%, the regular expression is considered to be invalid, normal matching cannot be completed, and the condition of missing detection is easily caused.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a regular expression detection apparatus according to a third embodiment of the present invention, which is applicable to detecting the correctness of a regular expression, and specifically, may be an application scenario in which a regular expression is used to filter text data generated by a video service.
Further, the apparatus may be integrated into the regular expression detection device, and the regular expression detection device is not limited in this embodiment, and may be a server, a computer, or the like. In this embodiment, the regular expression detection device is exemplified as a server, and the server may be an independent server or a cluster server; which may be a physical server or a virtual server.
Furthermore, the server can also provide a user interaction interface for facilitating the user to check and manage the flow of regular expression detection.
Referring to fig. 3, the apparatus specifically includes the following structure: a regular expression determination module 310, a text data acquisition module 320, a matching module 330, and a correctness determination module 340.
A regular expression determining module 310, configured to determine a regular expression set for the video service.
A text data obtaining module 320, configured to obtain text data generated based on the video service.
A matching module 330, configured to match the text data using the regular expression.
And the correctness determining module 340 is configured to determine correctness of the regular expression according to the matching success rate of the regular expression when the matched text data meets a preset scale condition.
In the technical scheme of the embodiment, the regular expression set for the video service is determined; acquiring text data generated based on the video service; matching the text data by using the regular expression; when the matched text data meets the preset scale condition, the correctness of the regular expression is determined according to the matching success rate of the regular expression, and the problem that business personnel can easily and mistakenly write the regular expression due to high difficulty and complexity of writing the regular expression is solved, so that business accidents are avoided, the problem that the correctness cannot be visually judged due to poor readability is also solved, regular expressions are automatically screened, the risk of using the regular expression is reduced, and the cost caused by the business accidents is reduced. Furthermore, the premise that the text data meet the preset scale condition is increased, the reliability and accuracy of the matching success rate calculation are ensured, and the condition that the detection result is in error due to insufficient scale of the text data is avoided.
On the basis of the above technical solution, the correctness determining module 340 includes:
and the scale condition determining unit is used for determining that the matched text data meets the preset scale condition if the matching time length of the matched text data exceeds the preset time length and the total number of the matched text data exceeds the preset number.
And the matching success rate determining unit is used for determining the matching success rate of the regular expression.
The proportion judging unit is used for judging whether the matching success rate is higher than a preset proportion or not; if yes, determining that the regular expression is wrong; if not, determining that the regular expression is correct.
On the basis of the technical scheme, the device further comprises:
and the matching stopping module is used for stopping matching the text data by using the regular expression when the regular expression is determined to be wrong after the correctness of the regular expression is determined according to the matching success rate of the regular expression.
And the matching and maintaining module is used for maintaining the matching of the text data by using the regular expression when the regular expression is determined to be correct.
On the basis of the technical scheme, the device further comprises:
and the manual checking operation receiving module is used for receiving manual checking operation aiming at the regular expression after the regular expression is stopped being used for matching the text data when the regular expression is determined to be wrong.
And the correctness correction module is used for correcting the correctness of the regular expression according to the manual check operation.
On the basis of the technical scheme, the manual checking operation receiving module comprises:
and the regular expression display unit is used for displaying the regular expression.
And the text data display unit is used for displaying the text data successfully matched with the regular expression.
Receiving a manual verification operation.
On the basis of the technical scheme, the text data generated by the video service is the text data modified or issued by the user;
the device further comprises: and the text data forbidding module is used for forbidding to modify or release the text data successfully matched with the regular expression after the text data is matched by using the regular expression.
A text data allowing module for allowing the text data which fails to be matched with the regular expression to be modified or released.
On the basis of the technical scheme, the text data comprises at least one of the following: bullet screen data, user nickname data and user signature data.
The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 4 is a schematic structural diagram of a regular expression detection device according to a fourth embodiment of the present invention. As shown in fig. 4, the regular expression detection apparatus includes: a processor 40, a memory 41, an input device 42, and an output device 43. The number of processors 40 in the regular expression detection device may be one or more, and one processor 40 is taken as an example in fig. 4. The number of the memories 41 in the regular expression detection device may be one or more, and one memory 41 is taken as an example in fig. 4. The processor 40, the memory 41, the input device 42, and the output device 43 of the regular expression detection apparatus may be connected by a bus or other means, and fig. 4 illustrates the bus connection. The regular expression detection device can be a computer, a server and the like. In this embodiment, the regular expression detection device is taken as a server for detailed description, and the server may be an independent server or a cluster server, and may be an entity server or a cloud server.
The memory 41 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the regular expression detection method according to any embodiment of the present invention (for example, the regular expression determination module 310, the text data acquisition module 320, the matching module 330, and the correctness determination module 340 in the regular expression detection apparatus). The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 41 may further include memory located remotely from processor 40, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 42 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the regular expression detection apparatus, as well as a camera for acquiring images and a sound pickup apparatus for acquiring audio data. The output means 43 may comprise an audio device such as a speaker. It should be noted that the specific composition of the input device 42 and the output device 43 can be set according to actual conditions.
The processor 40 executes various functional applications of the device and data processing, i.e., implements the regular expression detection method described above, by running software programs, instructions, and modules stored in the memory 41.
EXAMPLE five
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are executed by a computer processor to perform a regular expression detection method, and the method includes:
determining a regular expression set for the video service;
acquiring text data generated based on the video service;
matching the text data by using the regular expression;
and when the matched text data meets a preset scale condition, determining the correctness of the regular expression according to the matching success rate of the regular expression.
Of course, the storage medium containing the computer-executable instructions provided in the embodiments of the present invention is not limited to the regular expression detection method operations described above, and may also perform related operations in the regular expression detection method provided in any embodiment of the present invention, and has corresponding functions and advantages.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions to enable a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the regular expression detection method according to any embodiment of the present invention.
It should be noted that, in the regular expression detection apparatus, each unit and each module included in the regular expression detection apparatus are only divided according to functional logic, but are not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (8)

1. A regular expression detection method, comprising:
determining a regular expression set for the video service;
acquiring text data generated based on the video service, wherein the text data generated by the video service is text data modified or issued by a user;
matching the text data by using the regular expression, wherein the matching comprises auditing the text data;
forbidding to modify or publish the text data successfully matched with the regular expression;
allowing modification or publication of the text data that failed to match the regular expression;
if the matching duration of the matched text data exceeds a preset duration and the total number of the matched text data exceeds a preset number, determining that the matched text data meets a preset scale condition, wherein the matching duration is calculated from the time point when the regular expression is put into use, and the total number of the matched text data is calculated from the time point when the regular expression is put into use;
determining the matching success rate of the regular expression, including: taking the text data successfully matched with the regular expression as target text data, and taking the ratio of the number of the target text data to the total number of the text data as the matching success rate of the regular expression;
judging whether the matching success rate is higher than a preset proportion or not;
if yes, determining that the regular expression is wrong;
if not, determining that the regular expression is correct.
2. The method of claim 1, after determining the correctness of the regular expression according to the matching success rate of the regular expression, comprising:
stopping matching the text data by using the regular expression when the regular expression is determined to be wrong;
when the regular expression is determined to be correct, keeping matching the text data by using the regular expression.
3. The method of claim 1, further comprising, after stopping matching the text data using the regular expression when the regular expression is determined to be erroneous,:
receiving a manual check operation for the regular expression;
and correcting the correctness of the regular expression according to the manual checking operation.
4. The method of claim 3, wherein receiving a manual check operation for the regular expression comprises:
displaying the regular expression;
displaying the text data successfully matched with the regular expression;
receiving a manual verification operation.
5. The method of claim 1, wherein the text data comprises at least one of: bullet screen data, user nickname data and user signature data.
6. A regular expression detection apparatus, comprising:
the regular expression determining module is used for determining a regular expression set for the video service;
the text data acquisition module is used for acquiring text data generated based on the video service, wherein the text data generated by the video service is text data modified or issued by a user;
the matching module is used for matching the text data by using the regular expression, and the matching comprises auditing the text data;
a text data forbidding module, configured to forbid modifying or publishing the text data successfully matched with the regular expression after the text data is matched by using the regular expression;
a text data allowing module for allowing the text data which fails to be matched with the regular expression to be modified or released;
the scale condition determining unit is used for determining that the matched text data meets a preset scale condition if the matching duration of the matched text data exceeds a preset duration and the total number of the matched text data exceeds a preset number, wherein the matching duration is calculated from the time point when the regular expression is put into use, and the total number of the matched text data is calculated from the time point when the regular expression is put into use;
the matching success rate determining unit is used for determining the matching success rate of the regular expression, taking the text data successfully matched with the regular expression as target text data, and taking the ratio of the number of the target text data to the total number of the text data as the matching success rate of the regular expression;
a proportion judging unit, configured to judge whether the matching success rate is higher than a preset proportion: if yes, determining that the regular expression is wrong; if not, determining that the regular expression is correct.
7. A regular expression detection device, comprising: a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the regular expression detection method of any of claims 1-5.
8. A storage medium containing computer-executable instructions, which when executed by a computer processor, operate to perform the regular expression detection method of any of claims 1-5.
CN201811594357.1A 2018-12-25 2018-12-25 Regular expression detection method, device, equipment and storage medium Active CN109726312B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811594357.1A CN109726312B (en) 2018-12-25 2018-12-25 Regular expression detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811594357.1A CN109726312B (en) 2018-12-25 2018-12-25 Regular expression detection method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109726312A CN109726312A (en) 2019-05-07
CN109726312B true CN109726312B (en) 2021-10-08

Family

ID=66296781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811594357.1A Active CN109726312B (en) 2018-12-25 2018-12-25 Regular expression detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109726312B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674247A (en) * 2019-09-23 2020-01-10 广州虎牙科技有限公司 Barrage information intercepting method and device, storage medium and equipment
CN110928793B (en) * 2019-11-28 2023-07-28 Oppo广东移动通信有限公司 Regular expression detection method and device and computer readable storage medium
CN110990352A (en) * 2019-12-09 2020-04-10 华青融天(北京)软件股份有限公司 Method and device for determining data extraction rule, computer equipment and medium
CN111026929B (en) * 2019-12-27 2023-07-21 咪咕文化科技有限公司 Text approval method, device and storage medium
CN112559817A (en) * 2020-11-13 2021-03-26 北京创业光荣信息科技有限责任公司 Report content checking method, system, computer equipment and storage medium
CN112565655A (en) * 2020-11-27 2021-03-26 Oppo广东移动通信有限公司 Video data yellow identification method and device, electronic equipment and storage medium
CN113296670A (en) * 2021-07-26 2021-08-24 富通云腾科技有限公司 Regularization expression method of editable parameters

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023984A (en) * 2009-09-10 2011-04-20 阿里巴巴集团控股有限公司 Method and system for screening duplicated entity data
CN102421074A (en) * 2011-07-26 2012-04-18 中兴通讯股份有限公司 Short message monitoring method and device
CN102982048A (en) * 2011-09-07 2013-03-20 百度在线网络技术(北京)有限公司 Method and device for assessing junk information mining rule
CN106411704A (en) * 2016-09-19 2017-02-15 南京邮电大学 Distributed junk short message recognition method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7308446B1 (en) * 2003-01-10 2007-12-11 Cisco Technology, Inc. Methods and apparatus for regular expression matching
CN105574032A (en) * 2014-10-15 2016-05-11 阿里巴巴集团控股有限公司 Rule matching operation method and device
CN107705828A (en) * 2017-09-20 2018-02-16 广西金域医学检验所有限公司 Prejudge detection and processing method and processing device, terminal device, the storage medium of rule
CN108304372B (en) * 2017-09-29 2021-08-03 腾讯科技(深圳)有限公司 Entity extraction method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023984A (en) * 2009-09-10 2011-04-20 阿里巴巴集团控股有限公司 Method and system for screening duplicated entity data
CN102421074A (en) * 2011-07-26 2012-04-18 中兴通讯股份有限公司 Short message monitoring method and device
CN102982048A (en) * 2011-09-07 2013-03-20 百度在线网络技术(北京)有限公司 Method and device for assessing junk information mining rule
CN106411704A (en) * 2016-09-19 2017-02-15 南京邮电大学 Distributed junk short message recognition method

Also Published As

Publication number Publication date
CN109726312A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
CN109726312B (en) Regular expression detection method, device, equipment and storage medium
CN109408639B (en) Bullet screen classification method, bullet screen classification device, bullet screen classification equipment and storage medium
CN104602131A (en) Barrage processing method and system
CN109803152B (en) Violation auditing method and device, electronic equipment and storage medium
CN111090813B (en) Content processing method and device and computer readable storage medium
CN107612815B (en) Information sending method, device and equipment
CN103716690A (en) Multimedia live broadcast whistle-blowing method, terminal, server and system
CN108449609A (en) The recognition methods of direct broadcasting room event and device, electronic equipment, machine readable media
CN107784205B (en) User product auditing method, device, server and storage medium
CN109829379B (en) Information processing method, information processing apparatus, server, and storage medium
CN110826799B (en) Service prediction method, device, server and readable storage medium
CN109766807A (en) Machine audits processing method, device, electronic equipment and storage medium
CN111586432B (en) Method and device for determining air-broadcast live broadcast room, server and storage medium
CN111372116A (en) Video playing prompt information processing method and device, electronic equipment and storage medium
CN111372091A (en) Live content risk information control method and system
US11455693B2 (en) Visual focal point composition for media capture based on a target recipient audience
CN112069075A (en) Fashion testing method and device for game role and game client
CN113824987A (en) Method, medium, device and computing equipment for determining time consumption of first frame of live broadcast room
CN112306870A (en) Data processing method and device based on live APP
CN109831696B (en) Method and device for processing illegal video content, electronic equipment and storage medium
CN115348479A (en) Video playing problem identification method, electronic equipment and storage medium
US9148708B2 (en) Automated statutory warning system to display disclaimers
WO2021129849A1 (en) Log processing method, apparatus and device, and storage medium
CN109922359B (en) User processing method, device, equipment and storage medium
CN113382268B (en) Live broadcast anomaly analysis method, live broadcast anomaly analysis device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant