CN113779250A

CN113779250A - Standardized text data processing system

Info

Publication number: CN113779250A
Application number: CN202111047940.2A
Authority: CN
Inventors: 彭明齐; 耿峰; 周振泉
Original assignee: Shanghai Songxin Intelligent Technology Co ltd
Current assignee: Shanghai Songxin Intelligent Technology Co ltd
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2021-12-10

Abstract

The application relates to a standardized text data processing system, which relates to the technical field of computers and comprises an acquisition module, a theme screening module, a demand screening module, a quality inspection module, an early warning module and an output module, wherein the acquisition module is used for acquiring text information issued on an information source in the Internet; the theme screening module is used for taking the text information of the related words containing the required theme as a target text; the requirement screening module is used for determining the requirement direction of each target text for the requirement theme through the emotional words in each target text; the quality inspection module is used for performing quality inspection on the target text; the early warning module is used for early warning illegal texts detected by the quality inspection module and prompting a user that risks exist in corresponding texts; and the output module is used for outputting the screened text to feed back to the user. The method and the device have the advantages that the data needing to be mined by the data processing and mining system is simplified, the consumed time is reduced, the searching efficiency of the user is improved, and the user requirements can be well met.

Description

Standardized text data processing system

Technical Field

The application relates to the technical field of computers, in particular to a standardized text data processing system.

Background

At present, data analysis refers to analyzing a large amount of collected data by using an appropriate statistical and analytical method, summarizing, understanding and digesting the data so as to maximally develop the function of the data and play the role of the data. Data analysis is the process of studying and summarizing data in detail to extract useful information and to form conclusions. The data is also referred to as observation values and is the result of experiments, measurements, observations, investigations, and the like. The data processed in the data analysis is divided into qualitative data and quantitative data. Data that fall into only one category and cannot be measured numerically is called qualitative data. The qualitative data is represented as category, but is not sequential, and is classified data, such as gender, brand, and the like; the qualitative data is represented as categories, but is sorted sequentially, and is sequencing data such as academic calendar, quality grade of goods, and the like.

In the related art, since there are often many uncertain factors in the collected text data, whether in format or content, the collected text data usually needs to be processed to a certain extent before being provided for the subsequent processes. Compared with the information transmission mode of text, the multimedia is beneficial to enabling people to acquire information more easily and quickly. For example, short seconds or tens of seconds of multimedia may allow people to understand the general condition of an object. The rough situation of the object can be the performance of a product, the content of a news or the historical interest of a place, and in the process of respectively generating multimedia for a plurality of objects by using the collected text data, the collected text data is often required to be processed to a certain extent, and then the processed text data is used for generating the multimedia for the objects.

For the related technologies, the inventor thinks that the existing data processing and mining system needs to mine data, which is numerous and complicated, takes long time, has low efficiency, and cannot well satisfy users.

Disclosure of Invention

In order to solve the problem that data processing and mining take long time, the application provides a standardized text data processing system.

The standardized text data processing system provided by the application adopts the following technical scheme:

a standardized text data processing system comprises an acquisition module, a theme screening module, a demand screening module, a quality inspection module, an early warning module and an output module; the acquisition module is used for acquiring at least one piece of text information issued on at least one information source in the Internet; the theme screening module is used for taking the text information of the related words containing the requirement theme as a target text; the requirement screening module is used for determining the requirement direction of each target text for the requirement theme through the emotional words in the target text; the quality inspection module is used for performing quality inspection on the screened target text and determining illegal words contained in the target text; the early warning module is used for early warning the illegal texts detected by the quality inspection module to prompt a user that the corresponding texts have risks; and the output module is used for outputting the screened text and feeding back the text to the user.

Optionally, a memory module is connected in the quality inspection module, and before use, a designer can input some illegal words in the memory module to serve as a quality inspection basis of the quality inspection module.

Optionally, the output module is connected with a feedback module, the feedback module is connected with the memory module, the feedback module can be recorded into the memory module according to illegal words responded by a user, and the illegal words fed back by the feedback module can also be used as the quality inspection basis of the quality inspection module.

Optionally, a selection preference module is connected to the output module, the selection preference module is further connected to the acquisition module, the selection preference module memorizes the selection of the user after the output module outputs a plurality of texts, the selection preference module sends the selection of the user to the acquisition module, and the acquisition module acquires the selection of the user in a direction.

Optionally, a segmentation module is connected between the demand screening module and the quality inspection module, the segmentation module segments the text screened by the demand screening module into a plurality of entries, and the quality inspection module performs quality inspection on the entries segmented by the segmentation module.

Optionally, a conversion module is connected between the early warning module and the output module, the conversion module converts the screened text into a standardized text, and the output module outputs the standardized text converted by the conversion module.

In summary, the present application includes at least one of the following beneficial technical effects of a standardized text data processing system:

in application, a user inputs a search keyword, the acquisition module acquires a plurality of text messages published on a plurality of information sources in the Internet, the method comprises the steps that a theme screening module takes text information containing relevant words of a requirement theme as a target text, the requirement screening module determines the requirement direction of the target text for the requirement theme through emotion words in each target text, a quality inspection module performs quality inspection on the screened target text, illegal words contained in the target text are determined, an early warning module performs early warning on the illegal texts detected by the quality inspection module, the risk of corresponding texts of a user is prompted, an output module outputs the screened texts to the user, the data needing to be mined by a data processing and mining system is facilitated to be simplified, time consumed is reduced, the efficiency of searching by the user is improved, and the user requirements can be well met.

Drawings

FIG. 1 is a flow chart of a standardized text data processing system according to the present embodiment.

Reference numerals: 1. an acquisition module; 2. a topic screening module; 21. a matrix building module; 22. a feature word acquisition module; 23. a first matching module; 24. a topic screening submodule; 3. a demand screening module; 31. a second matching module; 32. a demand screening submodule; 33. a type identification module; 4. a segmentation module; 5. a quality inspection module; 51. a memory module; 6. an early warning module; 7. a conversion module; 8. an output module; 81. a feedback module; 82. an audit module; 83. a preference module is selected.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings of the embodiments of the present application. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the application without any inventive step, are within the scope of protection of the application.

Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The use of the terms "a" or "an" and the like in the description and in the claims of the present application do not denote a limitation of quantity, but rather denote the presence of at least one.

In the description of the present specification and claims, the terms "upper", "lower", "horizontal", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of describing the present application and simplifying the description, but do not indicate or imply that the referred device or unit must have a specific direction, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present application.

The present application is described in further detail below with reference to fig. 1.

The embodiment of the application discloses a standardized text data processing system.

Referring to fig. 1, a standardized text data processing system includes an acquisition module 1, a topic screening module 2, a requirement screening module 3, a quality inspection module 5, an early warning module 6, and an output module 8; when the system is used, a user inputs a plurality of keywords, then the acquisition module 1 acquires a plurality of text messages issued on a plurality of information sources in the Internet, then the theme screening module 2 takes the text messages containing the relevant words of the required theme as target texts, after the target texts are screened out, the required screening module 3 determines the required direction of the target texts to the required theme through the emotional words in each target text, then the quality inspection module 5 performs quality inspection on the screened target texts to determine illegal words contained in the target texts, after the quality inspection module 5 detects the illegal words in the target texts, the early warning module 6 performs early warning on the texts containing the illegal words to prompt the user that the corresponding texts have risks, and after the user removes the texts with the risks, the output module 8 outputs the screened texts to the user to simplify the data processing and mining system, the time consumed is reduced, and the searching efficiency of the user is improved.

The topic screening module 2 comprises a matrix establishing module 21, a characteristic word obtaining module 22, a first matching module 23 and a topic screening submodule 24. When the method is used, the matrix establishing module 21 finds out a plurality of keywords in each text message and constructs a distribution matrix of the keywords of the text message, the characteristic word obtaining module 22 determines characteristic words in the keywords of each text message through chi-square test to obtain a characteristic word set of each text message, the first matching module 23 matches related words of a demand topic with the characteristic words in the characteristic word set of each text message, and the topic screening submodule 24 takes the text message of which the characteristic words are successfully matched with the related words of the demand topic as the topic screening submodule 24 of the target text.

The requirement screening module 3 comprises a second matching module 31, a requirement screening submodule 32 and a type identification module 33.

The second matching module 31 matches each requirement word in the requirement word bank with the adjective part-of-speech keyword of each target text in the distribution matrix, and the requirement screening sub-module 32 takes the requirement direction and the requirement degree corresponding to the requirement word as the requirement direction of the target text for the requirement subject after the matching is successful. The type recognition module 33 determines whether the target text has a description reflecting the type of the requirement in the corresponding keyword in the distribution matrix.

After the text is selected by the requirement screening module 3, some texts may have words violating the social value view, some teenagers may only hold curious psychology to search, when the words violating the social value view and videos are received by the teenagers, the psychology of the teenagers is easily distorted, and the words screened by the requirement screening module 3 can be well subjected to quality inspection through the quality inspection module 5, so that the mental health of the teenagers is protected.

The memory module 51 is connected in the quality inspection module 5, before use, a designer can record some illegal words and websites in the memory module 51 as the quality inspection basis of the quality inspection module 5, and when the text to be screened by the screening module 3 contains the words and websites contained in the memory module 51, the quality inspection module 5 further filters the screened text. The memory module 51 can store a large amount of illegal words and websites, and can further enhance the safety of use along with the use time and the input of designers.

In the using process, the text of the quality inspection module 5 is usually a whole text, and a large number of words and phrases are overlapped in the text, so that the time for the quality inspection module 5 to perform quality inspection is increased, and the waiting time of workers is prolonged. After the improvement of designers, the requirement screening module 3 is connected with the quality inspection module 5 through the segmentation module 4, the text screened out by the requirement screening module 3 is segmented by the segmentation module 4, the original text after segmentation is changed into a plurality of simple words, and the quality inspection time of the quality inspection module 5 is greatly shortened.

After the quality inspection module 5 detects the quality of the text containing the illegal texts, the early warning module 6 carries out early warning on the illegal texts so as to improve the risk of the text of the user and reduce the search of the user on the text. After the early warning module 6 carries out early warning on illegal texts, the output module 8 outputs the texts.

Some text styles on the internet are various, some standardized text formats are usually the same, but the contents in the texts are different, the quality inspection module 5 only screens out some illegal texts in the texts, but the formats are not unified, the output module 8 outputs the texts in various same types and different formats, although the user can also understand the texts, the viewing effect of the user is reduced, after the improvement of designers, the conversion module 7 is connected between the early warning module 6 and the output module 8, the conversion module 7 can convert the texts after the quality inspection of the quality inspection module 5 into the standardized texts, and the viewing effect of the user on the texts is greatly improved.

The output module 8 is also connected with a feedback module 81, some illegal words in the existing memory module 51 are input by designers before use, and with the increase of the use time, some words and websites in the memory module 51 cannot meet the requirement of quality inspection of the text screened by the demand screening module 3. The output module 8 outputs the text, after the user finds that some illegal words exist in the text, the user can feed back the illegal words to the memory module 51 through the feedback module 81, the memory module 51 memorizes the words, the words memorized through the feedback module 81 can also be used as the words for quality inspection of the quality inspection module 5, along with the increase of the service time, the words in the memory module 51 are more and more, and the quality inspection function of the quality inspection module 5 is stronger and stronger.

In the use process, in order to prevent a user from mistakenly taking some legal vocabularies as illegal vocabularies, the feedback module 81 and the memory module 51 are connected with the auditing module 82, the auditing module 82 audits the vocabularies fed back by the feedback module 81, only the vocabularies approved by the auditing module 82 can be recorded by the memory module 51, and the accuracy of the text quality inspection screened by the quality inspection module 5 through the requirement screening module 3 is greatly improved.

Be connected with selection preference module 83 on output module 8, quality inspection module 5 quality inspection back, output module 8 can export a plurality of standardized texts, and when the user only needs a standardized text, selection preference module 83 can note down the standardized text type of user's selection at every turn, selection preference module 83 still is connected with collection module 1, selection preference module 83 feeds back the type of user's selection at every turn to collection module 1, collection module 1 then gathers according to the vocabulary and the type that correspond, the operating pressure and the operating time of follow-up module that can greatly reduced, realize that the user can be quick the collection of realization text.

The implementation principle of the standardized text data processing system in the embodiment of the application is as follows: before use, a designer enters some illegal words in the memory module 51 as a quality inspection basis of the quality inspection module 5, when the system is used, the acquisition module 1 acquires a plurality of text messages issued on a plurality of information sources in the internet, then the topic screening module 2 takes the text message containing the relevant words of the required topic as a target text, then the required screening module 3 determines the required direction of the target text for the required topic through the emotional words in each target text, then the separating module divides the text into a plurality of words, the quality inspection module 5 performs quality inspection on the divided words, when the divided words are overlapped with the words in the memory module 51, the early warning module 6 displays that the text has risks, and after the text does not contain the illegal words, the conversion module 7 converts the text into a standardized text and finally outputs the standardized text through the output module 8, when the text output by the output module 8 is still pure in illegal words, the user feeds back to the auditing module 82 through the feedback module 81, the memory module 51 memorizes the words after the auditing module 82 audits, the preference selection module 83 records the favorite text of the user and feeds back the favorite text to the acquisition module 1, and the acquisition module 1 acquires information according to the favorite text of the user.

The above embodiments are preferred embodiments of the present application, and the protection scope of the present application is not limited by the above embodiments, so: all equivalent changes made according to the structure, shape and principle of the present application shall be covered by the protection scope of the present application.

Claims

1. A standardized text data processing system, characterized by: the system comprises an acquisition module (1), a theme screening module (2), a demand screening module (3), a quality inspection module (5), an early warning module (6) and an output module (8);

the system comprises an acquisition module (1) for acquiring at least one text message issued on at least one information source in the Internet;

the theme screening module (2) is used for taking the text information containing the related words of the requirement theme as a target text;

the demand screening module (3) is used for determining the demand direction of each target text for the demand theme through the emotional words in the target text;

the quality inspection module (5) is used for performing quality inspection on the screened target text and determining illegal words contained in the target text;

the early warning module (6) is used for early warning the illegal texts detected by the quality detection module (5) to prompt a user that the corresponding texts have risks;

and the output module (8) is used for outputting the screened texts and feeding back the texts to the user.

2. A standardized text data processing system according to claim 1, characterized in that: the quality inspection module (5) is internally connected with a memory module (51), and before use, a designer can record some illegal words in the memory module (51) as the quality inspection basis of the quality inspection module (5).

3. A standardized text data processing system according to claim 2, characterized in that: be connected with feedback module (81) on output module (8), just feedback module (81) with memory module (51) are connected, feedback module (81) can be according to the illegal vocabulary entry of user reaction memory module (51), the illegal vocabulary of feedback module (81) also can be regarded as the quality control basis of quality control module (5).

4. A standardized text data processing system according to claim 1, characterized in that: the device comprises an output module (8), a selection preference module (83) is connected to the output module (8), the selection preference module (83) is further connected with an acquisition module (1), the selection preference module (83) memorizes the selection of a user after the output module (8) outputs a plurality of texts, the selection preference module (83) sends the selection of the user to the acquisition module (1), and the acquisition module (1) acquires the selection of the user in a direction.

5. A standardized text data processing system according to claim 1, characterized in that: a segmentation module (4) is connected between the requirement screening module (3) and the quality inspection module (5), the segmentation module (4) segments the text screened by the requirement screening module (3) into a plurality of entries, and the quality inspection module (5) performs quality inspection on the entries segmented by the segmentation module (4).

6. A standardized text data processing system according to claim 1, characterized in that: a conversion module (7) is connected between the early warning module (6) and the output module (8), the conversion module (7) converts the screened texts into standardized texts, and the output module (8) outputs the standardized texts converted by the conversion module (7).