CN113297488A - Data processing method and system based on big data and artificial intelligence - Google Patents

Data processing method and system based on big data and artificial intelligence Download PDF

Info

Publication number
CN113297488A
CN113297488A CN202110587083.9A CN202110587083A CN113297488A CN 113297488 A CN113297488 A CN 113297488A CN 202110587083 A CN202110587083 A CN 202110587083A CN 113297488 A CN113297488 A CN 113297488A
Authority
CN
China
Prior art keywords
entries
entry
effective
content
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110587083.9A
Other languages
Chinese (zh)
Inventor
黄海
马洪伟
吴霖瑞
张居正
谢昊岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuchang University
Original Assignee
Xuchang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuchang University filed Critical Xuchang University
Priority to CN202110587083.9A priority Critical patent/CN113297488A/en
Publication of CN113297488A publication Critical patent/CN113297488A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to the technical field of data processing, and particularly discloses a data processing method and a system based on big data and artificial intelligence, wherein the system comprises an identity confirmation unit, an entry acquisition unit, an entry screening unit, a content identification unit and an entry updating unit; the vocabulary entry screening unit is used for acquiring the expected number of vocabulary entries, performing text pre-screening on the vocabulary entries to be detected, and generating an effective vocabulary entry table based on the expected number of vocabulary entries and a screening result; and the content identification unit is used for accessing the corresponding website based on the effective entry list and identifying the content of the corresponding website. The method works on the basis of feedback given by a search engine, screens out some contents which a user does not want to see by performing text pre-screening on the entry to be detected, further filters out inappropriate contents by performing content identification on corresponding website contents, and finally presents the most useful value to the user, and the filtering effect is excellent.

Description

Data processing method and system based on big data and artificial intelligence
Technical Field
The invention relates to the technical field of data processing, in particular to a data processing method and system based on big data and artificial intelligence.
Background
With the development of science and technology and the progress of society, people need to learn continuously to adapt to the society and live better; the learning mode is mainly through books many years ago, but in the present era, the learning mode is mostly without the internet, especially some instant knowledge, people often search through some search engines, in the years when the internet is just popularized, the quality of the search engines is very high, and the wanted information can be easily obtained only by typing in keywords, but with the arrival of the traffic era, the search content gradually becomes the content mainly based on advertisements, which makes people very dislike.
Of course, many search engine developers also notice this situation, they often want to avoid invalid information, but because advertising revenue is too large, like some junk domain names can always occupy the head of the content in some illegal ways, the existing solution is to classify different information, such as inserting advertising tags in the advertising vocabulary, and it can be understood in our daily use that most of the advertising tags are inconspicuous, and although some search engines can give out the same obvious tags, in fact, it still occupies the position of useful information, making people very reputable.
Therefore, it is significant to design a system for processing these search contents to improve the search efficiency.
Disclosure of Invention
The present invention is directed to a data processing method and system based on big data and artificial intelligence, so as to solve the problems in the background art.
In order to achieve the purpose, the invention provides the following technical scheme:
a data processing method based on big data and artificial intelligence, the method comprising:
receiving a user query request, verifying the identity of a user, opening a content input port based on a verification result, acquiring query content, and establishing a connection channel with a search engine;
sending the query content to a search engine, receiving entry information returned by the search engine, and generating a to-be-detected entry library based on the default sequence of the search engine;
acquiring the number of expected entries, performing text pre-screening on the entries to be detected, and generating an effective entry table based on the number of the expected entries and a screening result;
accessing the corresponding website based on the effective entry table, and identifying the content of the corresponding website;
and updating the effective entry list based on the content identification result and displaying the effective entry list to the user.
As a further limitation of the technical scheme of the invention: the steps of receiving a user query request, verifying the identity of a user, opening a content input port based on a verification result, acquiring the query content, and establishing a connection channel with a search engine include:
acquiring account information of a user, and judging whether the account information is correct or incorrect;
if the account information is correct, opening a content input port;
if the account information is wrong, confirming a threshold value, adding one to the number of times of the mistake, and judging the number of times of the mistake and the size of the threshold value; if the error times are less than the threshold value, the account information of the user is obtained again, and if the error times are more than the threshold value, the face image of the user is obtained, and face recognition is carried out on the face image.
As a further limitation of the technical scheme of the invention: if the error times are greater than the threshold value, acquiring a face image of the user, and performing face recognition on the face image, wherein the steps of:
detecting a human face and capturing a human face image;
cutting a face area in the face image;
establishing a face model according to local textures and features in the face image;
and reading the face information in a face information database according to the face model and comparing the face information.
As a further limitation of the technical scheme of the invention: the steps of obtaining the expected number of the terms, pre-screening the text of the terms to be detected and generating an effective term list based on the expected number of the terms and the screening result comprise:
confirming the sensitive characters and generating a sensitive character library;
confirming the number of the predicted entries, establishing a connection channel with a to-be-detected entry library, and reading the to-be-detected entries based on the number of the predicted entries;
connecting the entries to be detected, and converting the connected entries to be detected into text information to generate a text file;
sequentially reading the sensitive characters, traversing the text file based on the sensitive characters, and acquiring a sensitive position;
and positioning the to-be-detected entry library based on the sensitive position, deleting the corresponding to-be-detected entry, and generating an effective entry table.
As a further limitation of the technical scheme of the invention: the steps of obtaining the expected number of terms, pre-screening the text of the terms to be detected and generating the effective term list based on the expected number of terms and the screening result further comprise:
reading a to-be-detected entry library, and randomly reading at least two to-be-detected entries;
performing text pre-screening on the at least two entries to be detected, performing content identification based on the two entries to be detected, and calculating the sub-duration;
and reading the estimated number of the entries, generating estimated duration based on the sub-duration and the estimated number of the entries, and displaying.
As a further limitation of the technical scheme of the invention: the step of accessing the corresponding website based on the effective entry list and identifying the content of the corresponding website comprises the following steps:
sequentially reading the effective entries in the effective entry table, acquiring the websites of the effective entries, and inquiring the access numbers of the corresponding websites;
performing descending arrangement on the effective entries in the effective entry table based on the access number;
sequentially accessing the websites where the effective entries are located, and acquiring corresponding website contents;
splitting the website content into an image file and a text file based on a file suffix name, and identifying the content of the image file and the text file.
As a further limitation of the technical scheme of the invention: the step of updating the valid entry list based on the content recognition result and displaying the updated valid entry list to the user includes:
acquiring a content identification result of the effective entry, and generating an illegal level;
confirming a first violation threshold and a second violation threshold, and judging the violation level based on the two violation thresholds, wherein the second violation threshold is larger than the first violation threshold;
if the violation level is greater than a second violation threshold, deleting the corresponding valid entry from the valid entry table;
and if the violation level is greater than the first violation threshold and less than the second violation threshold, encrypting the corresponding valid entry in the valid entry table, and decrypting based on the unlocking instruction.
A big data and artificial intelligence based data processing system, the system comprising:
the identity confirmation unit is used for receiving a user query request, verifying the identity of a user, opening a content input port based on a verification result, acquiring query content and establishing a connection channel with a search engine;
the entry acquisition unit is used for sending the query content to a search engine, receiving entry information returned by the search engine and generating a to-be-detected entry library based on the default sequence of the search engine;
the vocabulary entry screening unit is used for acquiring the expected number of vocabulary entries, performing text pre-screening on the vocabulary entries to be detected and generating an effective vocabulary entry table based on the expected number of vocabulary entries and a screening result;
the content identification unit is used for accessing the corresponding website based on the effective entry list and identifying the content of the corresponding website;
and the entry updating unit is used for updating the effective entry list based on the content identification result and displaying the effective entry list to the user.
As a further limitation of the technical scheme of the invention: the entry screening unit includes:
the character confirmation module is used for confirming the sensitive characters and generating a sensitive character library;
the reading module is used for confirming the expected number of the entries, establishing a connection channel with a to-be-detected entry library and reading the to-be-detected entries based on the expected number of the entries;
the connecting module is used for connecting the entries to be detected and converting the connected entries to be detected into text information to generate a text file;
the position acquisition module is used for sequentially reading the sensitive characters, traversing the text file based on the sensitive characters and acquiring sensitive positions;
and the deleting module is used for positioning the to-be-detected entry library based on the sensitive position, deleting the corresponding to-be-detected entry and generating an effective entry table.
As a further limitation of the technical scheme of the invention: the content recognition unit includes:
the number query module is used for sequentially reading the effective entries in the effective entry table, acquiring the websites of the effective entries and querying the access numbers of the corresponding websites;
the arrangement module is used for carrying out descending arrangement on the effective entries in the effective entry table based on the access number;
the content acquisition module is used for sequentially accessing the websites where the effective entries are located and acquiring corresponding website contents;
and the execution module is used for splitting the website content into an image file and a text file based on a file suffix name and identifying the content of the image file and the text file.
Compared with the prior art, the invention has the beneficial effects that: the invention receives a user inquiry request, verifies the user identity, opens a content input port based on a verification result, acquires inquiry content and establishes a connection channel with a search engine; sending the query content to a search engine, receiving entry information returned by the search engine, and generating a to-be-detected entry library based on the default sequence of the search engine; acquiring the number of expected entries, performing text pre-screening on the entries to be detected, and generating an effective entry table based on the number of the expected entries and a screening result; accessing the corresponding website based on the effective entry table, and identifying the content of the corresponding website; and updating the effective entry list based on the content identification result and displaying the effective entry list to the user. The method works on the basis of feedback given by a search engine, screens out some contents which a user does not want to see by performing text pre-screening on the entry to be detected, further filters out inappropriate contents by performing content identification on corresponding website contents, and finally presents the most useful value to the user, wherein the filtering effect is excellent.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
FIG. 1 is a flow chart diagram of a data processing method based on big data and artificial intelligence.
FIG. 2 is a first sub-flow diagram of a big data and artificial intelligence based data processing method.
FIG. 3 is a second sub-flow diagram of a big data and artificial intelligence based data processing method.
FIG. 4 is a third sub-flow diagram of a data processing method based on big data and artificial intelligence.
FIG. 5 is a fourth sub-flow diagram of a data processing method based on big data and artificial intelligence.
FIG. 6 is a block diagram of a fifth sub-flow of a big data and artificial intelligence based data processing method.
FIG. 7 is a sixth sub-flow block diagram of a big data and artificial intelligence based data processing method.
FIG. 8 is a block diagram of a big data and artificial intelligence based data processing system.
FIG. 9 is a block diagram of an entry screening unit in a big data and artificial intelligence based data processing system.
FIG. 10 is a block diagram of a content recognition unit in a big data and artificial intelligence based data processing system.
FIG. 11 is a system architecture diagram of a big data and artificial intelligence based data processing system.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, etc. may be used to describe various data in embodiments of the present invention, these data should not be limited by these terms. These terms are only used to distinguish the same type of data from each other. For example, a first violation threshold may also be referred to as a second violation threshold without departing from the scope of embodiments of the present invention, which does not necessarily require or imply any such actual relationship or order between such entities or operations. Similarly, the second violation threshold may also be referred to as the first violation threshold. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 11 shows an architecture diagram of an intelligent data management and risk control system, which mainly includes a system 100 and a search engine, where the system 100 and the search engine are generally integrated on the same hardware terminal device, but they may also be installed on different devices, and if they are installed on different devices, a network exists between the two devices, and the network may include various connection types, but the present invention mainly includes a wireless communication link.
It should be understood that the number of systems 100 and search engines in FIG. 1 is merely illustrative. There may be any number of systems 100 and search engines, as desired for implementation.
Example 1
Fig. 1 shows a flow chart of a data processing method based on big data and artificial intelligence, and in an embodiment of the present invention, a data processing method based on big data and artificial intelligence is provided, where the method includes:
step S1: receiving a user query request, verifying the identity of a user, opening a content input port based on a verification result, acquiring query content, and establishing a connection channel with a search engine;
the purpose of step S1 is to receive the user query request, and then receive the content that the user thinks about the query, of course, in this process, a verification is made on the user' S identity, i.e. the product should be used with the right first, and the right is provided for the purpose of facilitating charging, which is also a necessary function for each mature product; the content input port is not unique in content acquisition mode, the content is dependent on specific input content, if the account information is account information, the account information is input through a common keyboard, if the account information is a voice instruction, the account information is definitely required to be recorded by a recording device, and as for face information, the face information is not used as a primary information judgment mode generally.
Step S2: sending the query content to a search engine, receiving entry information returned by the search engine, and generating a to-be-detected entry library based on the default sequence of the search engine;
it can be seen from the working flow of step S2 that step S2 is an alternative step, which converts the original user query process into a computer query process, and then generates a to-be-detected term library based on the term information fed back by the search engine.
Step S3: acquiring the number of expected entries, performing text pre-screening on the entries to be detected, and generating an effective entry table based on the number of the expected entries and a screening result;
step S3 is to perform text pre-screening on the entry to be detected, where the specific screening process includes first obtaining an expected number of entries, where the expected number of entries may be determined by a user or may be generated by default, reading the entry to be detected in the entry library to be detected based on the expected number of entries, and then performing text pre-screening on the entry to be detected, where this step may be understood as heading detection, and if all the content conjunction entries are illegal, its reference meaning is zero with a high probability, and finally, an effective entry table is generated, where the table length of the effective entry table is the expected number of entries.
Step S4: accessing the corresponding website based on the effective entry table, and identifying the content of the corresponding website;
and sequentially accessing corresponding websites based on the effective entry list, further acquiring corresponding contents, and then identifying the contents.
Step S5: updating the effective entry list based on the content identification result and displaying the effective entry list to the user;
and if the contents contain illegal contents or invalid contents, deleting the vocabulary entry corresponding to the contents, namely the updating operation, wherein the updating of the effective vocabulary entry table is to delete the vocabulary entry with improper contents in the original effective vocabulary entry table.
Fig. 2 shows a first sub-flow diagram of a data processing method based on big data and artificial intelligence, which details step S1, where the steps of receiving a user query request, verifying the identity of a user, opening a content input port based on the verification result, obtaining the query content, and establishing a connection channel with a search engine include:
step S11: acquiring account information of a user, and judging whether the account information is correct or incorrect;
firstly, account information of a user is obtained, and whether the account information is correct or incorrect is judged, the step needs to compare the account information with a database through the database, the account information is conventional content in the existing application program, and the account number is hooked with a mobile phone number under general conditions.
Step S12: if the account information is correct, opening a content input port;
if the account information is correct, a content reading operation is performed to read the content that the user wants to query, or the content that the user wants to input into the search engine.
Step S13: if the account information is wrong, confirming a threshold value, adding one to the number of times of the mistake, and judging the number of times of the mistake and the size of the threshold value; if the error times are less than the threshold value, acquiring account information of the user again, and if the error times are more than the threshold value, acquiring a face image of the user and carrying out face recognition on the face image;
step S13 is the step of identity verification, that is, if the user continuously loses wrong account information, the user identity needs to be further determined, and the specific implementation manner is implemented by face recognition.
Fig. 3 shows a second sub-flow diagram of a data processing method based on big data and artificial intelligence, which details step S13, where the step of acquiring a face image of a user and performing face recognition on the face image includes:
step S131: detecting a human face and capturing a human face image;
firstly, the capturing unit is not separated from the camera, the camera continuously takes pictures to capture the face, and when the angle is clear, the face part is frozen and the image of the face part is captured.
Step S132: cutting a face area in the face image;
the face area in the face image is target information to be analyzed, the size of the face is unified, then the face area in the face image is cut, and information is segmented, so that subsequent processing is facilitated.
Step S133: establishing a face model according to local textures and features in the face image;
according to the segmented face image, local textures and characteristics in the face image can be obtained, wherein the local textures and characteristics comprise 26 regions and more than 2000 characteristics, then a face model is built, and the data structure of the face model is the same as that of face information in a database.
Step S134: reading face information in a face information database according to the face model and comparing the face information with the face information;
since the data structure of the face model is the same as the data structure of the face information in the database, it is very easy for some basic operations, such as inserting, reading, querying, comparing, etc., to be defined in the data structure.
Fig. 4 shows a third sub-flow diagram of a data processing method based on big data and artificial intelligence, which details step S3, where the step of obtaining the number of predicted entries, performing text pre-screening on the entries to be checked, and the step of generating an effective entry table based on the number of predicted entries and the screening result includes:
step S31: confirming the sensitive characters and generating a sensitive character library;
in step S31, there is a step of sensitive character confirmation process, what is sensitive characters, which is a very uncertain concept, besides those sensitive characters that are obviously violated, there are many sensitive words related to individuals, such as "corpse", which is generally not wanted to be seen by children, but cannot be called by "corpse" for medical students, so that in the above sensitive character library, there is a part of recognized and not allowed vocabulary, and there is an input port for receiving input information of different users, if the user has unwanted content, the input port can be added to the sensitive character library, thereby achieving the effect of screen cheating.
Step S32: confirming the number of the predicted entries, establishing a connection channel with a to-be-detected entry library, and reading the to-be-detected entries based on the number of the predicted entries;
the purpose of step S32 is to read a certain number of entries to be checked based on the expected number of entries, and then perform the following operations.
Step S33: connecting the entries to be detected, and converting the connected entries to be detected into text information to generate a text file;
when text detection is carried out on the terms to be detected, because the single terms to be detected are very short and the sensitive characters of the terms are a plurality of short words, if the text detection is carried out according to a normal detection process, the text detection is actually a many-to-many comparison process, a large amount of operation related to addresses must exist in the comparison process, which is a very troublesome matter and is easy to make mistakes, in the step S33, the terms to be detected are connected into a text and then are compared, which is a many-to-one comparison process, and in the process of program design, the text detection is more clear and easy.
Step S34: sequentially reading the sensitive characters, traversing the text file based on the sensitive characters, and acquiring a sensitive position;
acquiring the sensitive position, where each character is located in a text file is easy to implement, and it is needless to say that the purpose of step S34 is to acquire the entry where the sensitive position is located.
Step S35: positioning a to-be-detected entry library based on the sensitive position, deleting the corresponding to-be-detected entry, and generating an effective entry table;
step S35 is established after positioning, and a valid entry table can be generated by deleting the corresponding entry to be checked.
Fig. 5 shows a fourth sub-flow diagram of the data processing method based on big data and artificial intelligence, which complements step S3, where the step of obtaining the number of predicted entries, performing text pre-screening on the entries to be checked, and generating an effective entry table based on the number of predicted entries and the screening result further includes:
step S36: reading a to-be-detected entry library, and randomly reading at least two to-be-detected entries;
steps S36-S38 are complementary to provide the user with an approximate length of time, since the present invention has a detection process, certainly much slower than a traditional search engine, and makes sense to provide an estimated length of time; first, step S36 is used to read at least two entries to be examined for calculating an average condition, and reducing the uncertainty of a single sample.
Step S37: performing text pre-screening on the at least two entries to be detected, performing content identification based on the two entries to be detected, and calculating the sub-duration;
step S37 is actually a workflow similar to the overall scheme of the present invention, except that the number of samples is small for calculating the duration;
step S38: reading the estimated number of terms, generating estimated duration based on the sub-duration and the estimated number of terms, and displaying;
step S38 is used to generate the estimated duration according to the sub-duration and the estimated number of entries, and display it to the user.
Fig. 6 shows a fifth sub-flow diagram of a data processing method based on big data and artificial intelligence, which details step S4, where the step of accessing the corresponding website based on the valid entry table and identifying the content of the corresponding website includes:
step S41: sequentially reading the effective entries in the effective entry table, acquiring the websites of the effective entries, and inquiring the access numbers of the corresponding websites;
step S41 is a prior art for querying the number of visits to the corresponding web address, and querying the volume of visits to the web address, and many existing tools may perform this function, such as an Alexa tool, but certainly, like some very small web addresses, there may be no query, and if the query is not available, the query is not considered as a valid entry by default.
Step S42: performing descending arrangement on the effective entries in the effective entry table based on the access number;
step S42 is a sorting step for sorting the valid entries in descending order based on the number of accesses, which are regarded as more likely useful entries.
Step S43: sequentially accessing the websites where the effective entries are located, and acquiring corresponding website contents;
step S44: splitting the website content into an image file and a text file based on a file suffix name, and identifying the content of the image file and the content of the text file;
the steps S43-S44 are access and identification links, there are many identification methods in the prior art, and the identification capability is high or low, which are all differentiated and can be used as a charging standard, such as that VIP provides more perfect identification capability; in short, the specific identification mode can be realized by adopting the prior art, and the technical scheme of the invention does not limit the identification mode.
Fig. 7 shows a sixth sub-flow diagram of a data processing method based on big data and artificial intelligence, which details step S5, wherein the step of updating the valid entry list based on the content recognition result and displaying to the user comprises:
step S51: acquiring a content identification result of the effective entry, and generating an illegal level;
steps S51-S54 are served for final display, for different contents, the present invention performs a classification, for contents which obviously violate rules, the corresponding entry is directly deleted, and if the contents are some advertisements and the like, the user may want to pay attention, then the display process cannot be "cut with one go", and all the contents are deleted.
Step S52: confirming a first violation threshold and a second violation threshold, and judging the violation level based on the two violation thresholds, wherein the second violation threshold is larger than the first violation threshold;
step S52 is a threshold determination step, which may be modified, and after being put into use, may be modified to some extent according to actual situations.
Step S53: if the violation level is greater than a second violation threshold, deleting the corresponding valid entry from the valid entry table;
step S53 is a handling approach for an obvious violation;
step S54: if the violation level is greater than the first violation threshold and less than the second violation threshold, encrypting the corresponding valid entry in the valid entry table, and decrypting based on the unlocking instruction;
step S54 is a "masker" processing method, where there are many unlocking instructions, but the unlocking instruction must be a location, and the simplest is that if the user uses a mobile phone, the user directly touches the corresponding content with a double click, and the double click is regarded as the unlocking instruction.
Example 2
Fig. 8 is a block diagram illustrating a composition of a big data and artificial intelligence based data processing system, and in an embodiment of the present invention, a big data and artificial intelligence based data processing system is provided, where the system includes:
an identity confirmation unit 101, configured to receive a user query request, verify a user identity, open a content input port based on a verification result, acquire query content, and establish a connection channel with a search engine;
the identity confirmation unit 101 is configured to complete step S1;
the entry obtaining unit 102 is configured to send the query content to a search engine, receive entry information returned by the search engine, and generate a to-be-detected entry library based on a default order of the search engine;
the entry obtaining unit 102 is configured to complete step S2;
the entry screening unit 103 is configured to obtain an expected number of entries, perform text pre-screening on the entry to be detected, and generate an effective entry table based on the expected number of entries and a screening result;
the entry filtering unit 103 is configured to complete step S3;
a content identification unit 104, configured to access a corresponding website based on the valid entry table, and perform content identification on corresponding website content;
the content identification unit 104 is configured to complete step S4;
an entry updating unit 105 for updating the valid entry table based on the content recognition result and displaying it to the user;
the entry updating unit 105 is configured to complete step S5.
Fig. 9 is a block diagram illustrating a configuration of an entry filtering unit in a big data and artificial intelligence based data processing system, where the entry filtering unit 103 includes:
a character confirmation module 1031, configured to confirm the sensitive character and generate a sensitive character library;
the character confirmation module 1031 is configured to complete step S31;
the reading module 1032 is used for confirming the expected number of the entries, establishing a connection channel with a to-be-detected entry library, and reading the to-be-detected entries based on the expected number of the entries;
the reading module 1032 is configured to complete step S32;
a connection module 1033, configured to connect the to-be-detected entry, and convert the connected to-be-detected entry into text information to generate a text file;
the connection module 1033 is configured to complete step S33;
a position obtaining module 1034, configured to sequentially read the sensitive characters, traverse the text file based on the sensitive characters, and obtain a sensitive position;
the position acquisition module 1034 is configured to complete step S34;
a deleting module 1035, configured to locate a to-be-detected entry library based on the sensitive position, delete the corresponding to-be-detected entry, and generate an effective entry table;
the deletion module 1035 is for completing step S35.
Fig. 10 is a block diagram illustrating a content recognition unit in a big data and artificial intelligence based data processing system, wherein the content recognition unit 104 comprises:
the number query module 1041 is configured to sequentially read the effective entries in the effective entry table, obtain websites of the effective entries, and query the number of accesses to the corresponding websites;
the number query module 1041 is configured to complete step S41;
the arranging module 1042 is configured to perform descending order arrangement on the effective entries in the effective entry table based on the access number;
the arrangement module 1042 is configured to complete step S42;
a content obtaining module 1043, configured to sequentially access the websites where the valid entries are located, and obtain corresponding website content;
the content obtaining module 1043 is configured to complete step S43;
the execution module 1044 is configured to split the website content into an image file and a text file based on a file suffix name, and perform content identification on the image file and the text file;
the execution module 1044 is configured to complete step S44.
The functions that can be realized by the big data and artificial intelligence based data processing system are all completed by computer equipment, the computer equipment comprises one or more processors and one or more memories, at least one program code is stored in the one or more memories, and the program code is loaded and executed by the one or more processors to realize the functions of the big data and artificial intelligence based data processing system.
The processor fetches instructions and analyzes the instructions one by one from the memory, then completes corresponding operations according to the instruction requirements, generates a series of control commands, enables all parts of the computer to automatically, continuously and coordinately act to form an organic whole, realizes the input of programs, the input of data, the operation and the output of results, and the arithmetic operation or the logic operation generated in the process is completed by the arithmetic unit; the Memory comprises a Read-Only Memory (ROM) for storing a computer program, and a protection device is arranged outside the Memory.
Illustratively, a computer program can be partitioned into one or more modules, which are stored in memory and executed by a processor to implement the present invention. One or more of the modules may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the terminal device.
Those skilled in the art will appreciate that the above description of the service device is merely exemplary and not limiting of the terminal device, and may include more or less components than those described, or combine certain components, or different components, such as may include input output devices, network access devices, buses, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the terminal equipment and connects the various parts of the entire user terminal using various interfaces and lines.
The memory may be used to store computer programs and/or modules, and the processor may implement various functions of the terminal device by operating or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory mainly comprises a storage program area and a storage data area, wherein the storage program area can store an operating system, application programs (such as an information acquisition template display function, a product information publishing function and the like) required by at least one function and the like; the storage data area may store data created according to the use of the berth-state display system (e.g., product information acquisition templates corresponding to different product types, product information that needs to be issued by different product providers, etc.), and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The terminal device integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the modules/units in the system according to the above embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used by a processor to implement the functions of the embodiments of the system. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A data processing method based on big data and artificial intelligence is characterized by comprising the following steps:
receiving a user query request, verifying the identity of a user, opening a content input port based on a verification result, acquiring query content, and establishing a connection channel with a search engine;
sending the query content to a search engine, receiving entry information returned by the search engine, and generating a to-be-detected entry library based on the default sequence of the search engine;
acquiring the number of expected entries, performing text pre-screening on the entries to be detected, and generating an effective entry table based on the number of the expected entries and a screening result;
accessing the corresponding website based on the effective entry table, and identifying the content of the corresponding website;
and updating the effective entry list based on the content identification result and displaying the effective entry list to the user.
2. The big data and artificial intelligence based data processing method as claimed in claim 1, wherein the steps of receiving a user query request, verifying a user identity, opening a content input port based on a verification result, obtaining the query content, and establishing a connection channel with a search engine comprise:
acquiring account information of a user, and judging whether the account information is correct or incorrect;
if the account information is correct, opening a content input port;
if the account information is wrong, confirming a threshold value, adding one to the number of times of the mistake, and judging the number of times of the mistake and the size of the threshold value; if the error times are less than the threshold value, the account information of the user is obtained again, and if the error times are more than the threshold value, the face image of the user is obtained, and face recognition is carried out on the face image.
3. The data processing method based on big data and artificial intelligence of claim 1, wherein if the number of errors is greater than the threshold, the step of obtaining the face image of the user and performing face recognition on the face image comprises:
detecting a human face and capturing a human face image;
cutting a face area in the face image;
establishing a face model according to local textures and features in the face image;
and reading the face information in a face information database according to the face model and comparing the face information.
4. The big data and artificial intelligence based data processing method and system as claimed in claim 1, wherein the step of obtaining the expected number of entries, pre-screening the text of the entries to be examined, and generating the effective entry list based on the expected number of entries and the screening result comprises:
confirming the sensitive characters and generating a sensitive character library;
confirming the number of the predicted entries, establishing a connection channel with a to-be-detected entry library, and reading the to-be-detected entries based on the number of the predicted entries;
connecting the entries to be detected, and converting the connected entries to be detected into text information to generate a text file;
sequentially reading the sensitive characters, traversing the text file based on the sensitive characters, and acquiring a sensitive position;
and positioning the to-be-detected entry library based on the sensitive position, deleting the corresponding to-be-detected entry, and generating an effective entry table.
5. The big data and artificial intelligence based data processing method and system as claimed in claim 1, wherein said step of obtaining the expected number of entries, pre-screening the text of the entry to be examined, and generating the effective entry table based on the expected number of entries and the screening result further comprises:
reading a to-be-detected entry library, and randomly reading at least two to-be-detected entries;
performing text pre-screening on the at least two entries to be detected, performing content identification based on the two entries to be detected, and calculating the sub-duration;
and reading the estimated number of the entries, generating estimated duration based on the sub-duration and the estimated number of the entries, and displaying.
6. The method and system for data processing based on big data and artificial intelligence according to claim 1, wherein the step of accessing the corresponding website based on the valid entry list and performing content identification on the corresponding website content comprises:
sequentially reading the effective entries in the effective entry table, acquiring the websites of the effective entries, and inquiring the access numbers of the corresponding websites;
performing descending arrangement on the effective entries in the effective entry table based on the access number;
sequentially accessing the websites where the effective entries are located, and acquiring corresponding website contents;
splitting the website content into an image file and a text file based on a file suffix name, and identifying the content of the image file and the text file.
7. The big data and artificial intelligence based data processing method and system as claimed in claim 6, wherein said step of updating the valid entry list based on the content recognition result and displaying to the user comprises:
acquiring a content identification result of the effective entry, and generating an illegal level;
confirming a first violation threshold and a second violation threshold, and judging the violation level based on the two violation thresholds, wherein the second violation threshold is larger than the first violation threshold;
if the violation level is greater than a second violation threshold, deleting the corresponding valid entry from the valid entry table;
and if the violation level is greater than the first violation threshold and less than the second violation threshold, encrypting the corresponding valid entry in the valid entry table, and decrypting based on the unlocking instruction.
8. A big data and artificial intelligence based data processing system, the system comprising:
the identity confirmation unit is used for receiving a user query request, verifying the identity of a user, opening a content input port based on a verification result, acquiring query content and establishing a connection channel with a search engine;
the entry acquisition unit is used for sending the query content to a search engine, receiving entry information returned by the search engine and generating a to-be-detected entry library based on the default sequence of the search engine;
the vocabulary entry screening unit is used for acquiring the expected number of vocabulary entries, performing text pre-screening on the vocabulary entries to be detected and generating an effective vocabulary entry table based on the expected number of vocabulary entries and a screening result;
the content identification unit is used for accessing the corresponding website based on the effective entry list and identifying the content of the corresponding website;
and the entry updating unit is used for updating the effective entry list based on the content identification result and displaying the effective entry list to the user.
9. The big data and artificial intelligence based data processing system of claim 8, wherein the entry filtering unit comprises:
the character confirmation module is used for confirming the sensitive characters and generating a sensitive character library;
the reading module is used for confirming the expected number of the entries, establishing a connection channel with a to-be-detected entry library and reading the to-be-detected entries based on the expected number of the entries;
the connecting module is used for connecting the entries to be detected and converting the connected entries to be detected into text information to generate a text file;
the position acquisition module is used for sequentially reading the sensitive characters, traversing the text file based on the sensitive characters and acquiring sensitive positions;
and the deleting module is used for positioning the to-be-detected entry library based on the sensitive position, deleting the corresponding to-be-detected entry and generating an effective entry table.
10. The big data and artificial intelligence based data processing system of claim 8, wherein the content identification unit comprises:
the number query module is used for sequentially reading the effective entries in the effective entry table, acquiring the websites of the effective entries and querying the access numbers of the corresponding websites;
the arrangement module is used for carrying out descending arrangement on the effective entries in the effective entry table based on the access number;
the content acquisition module is used for sequentially accessing the websites where the effective entries are located and acquiring corresponding website contents;
and the execution module is used for splitting the website content into an image file and a text file based on a file suffix name and identifying the content of the image file and the text file.
CN202110587083.9A 2021-05-27 2021-05-27 Data processing method and system based on big data and artificial intelligence Pending CN113297488A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110587083.9A CN113297488A (en) 2021-05-27 2021-05-27 Data processing method and system based on big data and artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110587083.9A CN113297488A (en) 2021-05-27 2021-05-27 Data processing method and system based on big data and artificial intelligence

Publications (1)

Publication Number Publication Date
CN113297488A true CN113297488A (en) 2021-08-24

Family

ID=77325690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110587083.9A Pending CN113297488A (en) 2021-05-27 2021-05-27 Data processing method and system based on big data and artificial intelligence

Country Status (1)

Country Link
CN (1) CN113297488A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521776A (en) * 2023-07-03 2023-08-01 陕西省君凯电子科技有限公司 Quick information query system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521776A (en) * 2023-07-03 2023-08-01 陕西省君凯电子科技有限公司 Quick information query system
CN116521776B (en) * 2023-07-03 2023-09-05 陕西省君凯电子科技有限公司 Quick information query system

Similar Documents

Publication Publication Date Title
US7707039B2 (en) Automatic modification of web pages
US11769014B2 (en) Classifying digital documents in multi-document transactions based on signatory role analysis
US11810070B2 (en) Classifying digital documents in multi-document transactions based on embedded dates
CN112507125A (en) Triple information extraction method, device, equipment and computer readable storage medium
US8064703B2 (en) Property record document data validation systems and methods
US20020114522A1 (en) System and method for compiling images from a database and comparing the compiled images with known images
US9372916B2 (en) Document template auto discovery
CN110929125A (en) Search recall method, apparatus, device and storage medium thereof
US20140215301A1 (en) Document template auto discovery
CN101641721A (en) Biometric matching method and apparatus
CN105787028A (en) Business card proofreading method and system
CN108304815A (en) A kind of data capture method, device, server and storage medium
CN107403093A (en) The system and method for detecting unnecessary software
WO2016086309A1 (en) System and method for interacting with information posted in the media
CN113032834A (en) Database table processing method, device, equipment and storage medium
US11797617B2 (en) Method and apparatus for collecting information regarding dark web
WO2018145637A1 (en) Method and device for recording web browsing behavior, and user terminal
CN114040012B (en) Information query pushing method and device and computer equipment
CN113297488A (en) Data processing method and system based on big data and artificial intelligence
CN111858236B (en) Knowledge graph monitoring method and device, computer equipment and storage medium
CN107239453B (en) Information writing method and device
CN114491134B (en) Trademark registration success rate analysis method and system
CN112163415A (en) User intention identification method and device for feedback content and electronic equipment
CN113741864B (en) Automatic semantic service interface design method and system based on natural language processing
CN111597453B (en) User image drawing method, device, computer equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination