CN117372164A - Data risk detection method and device, electronic equipment and storage medium - Google Patents

Data risk detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117372164A
CN117372164A CN202311311873.XA CN202311311873A CN117372164A CN 117372164 A CN117372164 A CN 117372164A CN 202311311873 A CN202311311873 A CN 202311311873A CN 117372164 A CN117372164 A CN 117372164A
Authority
CN
China
Prior art keywords
data
detection
content
compliance
product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311311873.XA
Other languages
Chinese (zh)
Inventor
满园园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202311311873.XA priority Critical patent/CN117372164A/en
Publication of CN117372164A publication Critical patent/CN117372164A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application provides a risk detection method and device for data, electronic equipment and a storage medium, and belongs to the field of financial science and technology. The method comprises the following steps: acquiring recommendation data, wherein the recommendation data comprises a product image and product description data; content analysis is carried out based on the product image and the product description data, so that semantic content data are obtained; content detection is carried out on the semantic content data to obtain content detection data; wherein the content detection includes at least one of: detecting legal words, marking missing detection and accurately detecting words; performing compliance evaluation on the content detection data to obtain compliance problem data; optimization suggestion data is generated based on the compliance problem data. According to the embodiment of the application, intelligent compliance detection of recommended data can be realized, and the accuracy of data compliance detection and the efficiency of data compliance detection are improved.

Description

Data risk detection method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of financial science and technology, and in particular, to a method and apparatus for detecting risk of data, an electronic device, and a storage medium.
Background
To enhance overall process compliance risk management for financial-type products, financial institutions often require more stringent compliance testing of various financial product data provided by agents.
At present, most financial institutions often adopt an offline detection mode, and related personnel are used for carrying out compliance detection on financial product data provided by agents, so that a plurality of detection links and a plurality of related personnel are required to participate in the process, and the problems of high labor cost and time cost exist. In addition, the manual detection method often depends on the working experience and subjective judgment of related personnel, which causes the problem of low accuracy of compliance detection.
Disclosure of Invention
The embodiment of the application mainly aims to provide a data risk detection method and device, electronic equipment and storage medium, and aims to realize intelligent combination rule detection of recommended data and improve accuracy and efficiency of data compliance detection.
To achieve the above object, a first aspect of an embodiment of the present application provides a risk detection method for data, including:
acquiring recommendation data, wherein the recommendation data comprises a product image and product description data;
Content analysis is carried out based on the product image and the product description data, so that semantic content data are obtained;
performing content detection on the semantic content data to obtain content detection data; wherein the content detection includes at least one of: detecting legal words, marking missing detection and accurately detecting words;
performing compliance evaluation on the content detection data to obtain compliance problem data;
generating optimization suggestion data based on the compliance problem data.
In some embodiments, the performing content detection on the semantic content data to obtain content detection data includes:
performing word detection on the semantic content data based on a preset target word to obtain a first word consistent with the target word in the semantic content data, wherein the target word is an illegal word;
product content detection is carried out on the semantic content data based on preset candidate product content, and a first difference part between the semantic content data and the candidate product content is obtained, wherein the candidate product content is legal product content;
sentence detection is carried out on the semantic content data based on a preset target description sentence, so that a first sentence consistent with the target description sentence in the semantic content data is obtained, wherein the target description sentence is an illegal description sentence;
And obtaining the content detection data based on the first word, the first difference part and the first sentence, wherein the content detection data is used for indicating illegal words existing in the semantic content data.
In some embodiments, the performing content detection on the semantic content data to obtain content detection data includes:
detecting risk words of the semantic content data to obtain risk words;
performing main body detection on the semantic content data to obtain a second word for describing a product main body, a third word for describing a service main body and a fourth word for describing a mechanism main body;
detecting the data source of the semantic content data to obtain a fifth word for describing the source of the recommendation data;
and labeling and detecting the risk words, the second words, the third words, the fourth words and the fifth words to obtain the content detection data, wherein the content detection data are used for indicating the risk words, the second words, the third words, the fourth words and the fifth words which are not labeled.
In some embodiments, the performing content detection on the semantic content data to obtain content detection data includes:
carrying out grammar detection on the semantic content data based on a preset grammar rule to obtain a grammar detection result, wherein the grammar detection result is used for indicating a content part with at least one of logic errors and grammar errors in the semantic content data;
extracting a product description sentence in the semantic content data;
comparing the product description sentence with a reference product description of a reference product in a preset product library to obtain a second difference part of the product description sentence and the reference product description;
and obtaining the content detection data based on the grammar detection result and the second difference part.
In some embodiments, the generating optimization suggestion data based on the compliance issue data includes:
extracting a target compliance problem in the compliance problem data;
screening candidate strategies in a preset mapping table based on the target compliance problem, wherein the preset mapping table is used for representing the corresponding relation between each candidate compliance problem and the candidate strategy;
Performing problem similarity scoring on the target compliance problem and the candidate compliance problem to obtain problem scoring data;
selecting a candidate strategy corresponding to the candidate compliance problem with the problem scoring data larger than a preset threshold as a target strategy;
and generating the optimization suggestion data based on the target strategy.
In some embodiments, after the generating optimization suggestion data based on the compliance issue data, the method further comprises:
transmitting the optimization suggestion data to a target object, wherein the target object is an object for providing the suggestion data;
receiving updated data fed back by the target object, wherein the updated data is data obtained by optimizing the recommended data by the target object according to the optimized recommended data;
performing compliance detection on the updated data to obtain a compliance detection result, wherein the compliance detection result is used for representing whether the updated data has a compliance problem or not;
and if the compliance detection result indicates that the update data has no compliance problem, the update data is sent to an auditing end for auditing.
In some embodiments, the content analysis based on the product image and the product description data, to obtain semantic content data, includes:
Performing text detection on the product image to obtain first text data;
performing content detection on the product description data to obtain second text data;
and obtaining the semantic content data based on the first text data and the second text data.
To achieve the above object, a second aspect of the embodiments of the present application proposes a risk detection device for data, the device including:
the data acquisition module is used for acquiring recommended data, wherein the recommended data comprises a product image and product description data;
the content analysis module is used for carrying out content analysis based on the product image and the product description data to obtain semantic content data;
the content detection module is used for detecting the content of the semantic content data to obtain content detection data; wherein the content detection includes at least one of: detecting legal words, marking missing detection and accurately detecting words;
the compliance evaluation module is used for carrying out compliance evaluation on the content detection data to obtain compliance problem data;
and the suggestion generation module is used for generating optimization suggestion data based on the compliance problem data.
To achieve the above object, a third aspect of the embodiments of the present application proposes an electronic device, which includes a memory, a processor, where the memory stores a computer program, and the processor implements the method described in the first aspect when executing the computer program.
To achieve the above object, a fourth aspect of the embodiments of the present application proposes a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method of the first aspect.
The data risk detection method, the data risk detection device, the electronic equipment and the storage medium are used for acquiring recommended data, wherein the recommended data comprises product images and product description data; and carrying out content analysis based on the product image and the product description data to obtain semantic content data, and realizing semantic understanding of recommended data by combining the image and text data. Further, content detection is carried out on semantic content data to obtain content detection data; wherein the content detection includes at least one of: legal word detection, labeling omission detection and accurate word detection can realize multi-dimensional content detection of semantic content data, and the comprehensiveness and diversity of content detection are improved. Further, carrying out compliance evaluation on the content detection data to obtain compliance problem data; the optimization suggestion data is generated based on the compliance problem data, so that the accuracy of compliance assessment can be improved, the accuracy of the generated optimization suggestion data is improved, intelligent compliance detection of the recommendation data is realized, and the accuracy of data compliance detection and the efficiency of data compliance detection are improved.
Drawings
FIG. 1 is a flow chart of a risk detection method for data provided by an embodiment of the present application;
fig. 2 is a flowchart of step S102 in fig. 1;
fig. 3 is a flowchart of step S103 in fig. 1;
fig. 4 is another flowchart of step S103 in fig. 1;
fig. 5 is another flowchart of step S103 in fig. 1;
fig. 6 is a flowchart of step S105 in fig. 1;
FIG. 7 is another flow chart of a risk detection method for data provided by an embodiment of the present application;
fig. 8 is a schematic structural diagram of a risk detection device for data provided in an embodiment of the present application;
fig. 9 is a schematic hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.
First, several nouns referred to in this application are parsed:
artificial intelligence (artificial intelligence, AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding the intelligence of people; artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a manner similar to human intelligence, research in this field including robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Natural language processing (natural language processing, NLP): NLP is a branch of artificial intelligence that is a interdisciplinary of computer science and linguistics, and is often referred to as computational linguistics, and is processed, understood, and applied to human languages (e.g., chinese, english, etc.). Natural language processing includes parsing, semantic analysis, chapter understanding, and the like. Natural language processing is commonly used in the technical fields of machine translation, handwriting and print character recognition, voice recognition and text-to-speech conversion, information intent recognition, information extraction and filtering, text classification and clustering, public opinion analysis and opinion mining, and the like, and relates to data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research, linguistic research related to language calculation, and the like.
Information extraction (Information Extraction, NER): extracting the fact information of the appointed type of entity, relation, event and the like from the natural language text, and forming the text processing technology of the structured data output. Information extraction is a technique for extracting specific information from text data. Text data is made up of specific units, such as sentences, paragraphs, chapters, and text information is made up of small specific units, such as words, phrases, sentences, paragraphs, or a combination of these specific units. The noun phrase, the name of a person, the name of a place, etc. in the extracted text data are all text information extraction, and of course, the information extracted by the text information extraction technology can be various types of information.
To enhance overall process compliance risk management for financial-type products, financial institutions often require more stringent compliance testing of various financial product data provided by agents.
At present, most financial institutions often adopt an offline detection mode, and related personnel are used for carrying out compliance detection on financial product data provided by agents, so that a plurality of detection links and a plurality of related personnel are required to participate in the process, and the problems of high labor cost and time cost exist. In addition, the manual detection method often depends on the working experience and subjective judgment of related personnel, which causes the problem of low accuracy of compliance detection.
Based on the above, the embodiment of the application provides a data risk detection method, a data risk detection device, electronic equipment and a storage medium, which aim to realize intelligent compound rule detection of recommended data and improve the accuracy and efficiency of data compliance detection.
The method and apparatus for risk detection of data, electronic device and storage medium provided in the embodiments of the present application are specifically described through the following embodiments, and the method for risk detection of data in the embodiments of the present application is described first.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The embodiment of the application provides a risk detection method for data, and relates to the technical field of digital medical treatment. The risk detection method of the data provided by the embodiment of the application can be applied to the terminal, can also be applied to the server side, and can also be software running in the terminal or the server side. In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, etc.; the server side can be configured as an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like; the software may be an application or the like that implements the risk detection method of the data, but is not limited to the above form.
The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In the various embodiments of the present application, when related processing is required according to data related to identity or characteristics of an object, such as object information, object behavior data, object history data, and object position information, permission or consent of the object is obtained first, and related laws and regulations and standards are complied with for collection, use, processing, and the like of the data. In addition, when the personal information of the object needs to be acquired in the embodiment of the application, the independent permission or independent consent of the object is acquired through a popup window or a jump to a confirmation page or the like, and after the independent permission or independent consent of the object is explicitly acquired, the necessary object related data for enabling the embodiment of the application to normally operate is acquired.
Fig. 1 is an optional flowchart of a method for risk detection of data provided in an embodiment of the present application, where the method in fig. 1 may include, but is not limited to, steps S101 to S105.
Step S101, recommendation data is obtained, wherein the recommendation data comprises product images and product description data;
step S102, content analysis is carried out based on the product image and the product description data, so as to obtain semantic content data;
step S103, content detection is carried out on semantic content data to obtain content detection data; wherein the content detection includes at least one of: detecting legal words, marking missing detection and accurately detecting words;
step S104, carrying out compliance assessment on the content detection data to obtain compliance problem data;
step S105, generating optimization suggestion data based on the compliance problem data.
Step S101 to step S105 illustrated in the embodiment of the present application, by acquiring recommendation data, where the recommendation data includes a product image and product description data; and carrying out content analysis based on the product image and the product description data to obtain semantic content data, and realizing semantic understanding of recommended data by combining the image and text data. Further, content detection is carried out on semantic content data to obtain content detection data; wherein the content detection includes at least one of: legal word detection, labeling omission detection and accurate word detection can realize multi-dimensional content detection of semantic content data, and the comprehensiveness and diversity of content detection are improved. Further, carrying out compliance evaluation on the content detection data to obtain compliance problem data; the optimization suggestion data is generated based on the compliance problem data, so that the accuracy of compliance assessment can be improved, the accuracy of the generated optimization suggestion data is improved, intelligent compliance detection of the recommendation data is realized, and the accuracy of data compliance detection and the efficiency of data compliance detection are improved.
In step S101 of some embodiments, the recommendation data refers to graphic data made by an agent in the financial field for recommending financial products, and the recommendation data includes product images and product description data. After the agent completes the production of the recommended data, the recommended data is uploaded to the financial platform, so that the financial platform server can conduct compliance detection on the recommended data and issue the recommended data without compliance problems, the recommended data is displayed on the financial platform, and a network user can view the recommended data to know related financial products. Therefore, the server can directly acquire the uploaded recommendation data.
Further, after the agent completes the production of the recommendation data, the recommendation data may be stored in a local terminal device such as the target terminal. The server may also invoke the recommendation data stored in the local terminal device after permission of the local terminal device by sending a data acquisition request to the local terminal device.
In some embodiments, the recommended data may be presented as an image containing text, or may be presented in the form of a PPT with text-to-text combination, without limitation.
Referring to fig. 2, in some embodiments, step S102 may include, but is not limited to, steps S201 to S203:
Step S201, performing text detection on a product image to obtain first text data;
step S202, content detection is carried out on the product description data to obtain second text data;
step S203, obtaining semantic content data based on the first text data and the second text data.
The following describes step S201 to step S203 in detail.
In step S201 of some embodiments, text recognition may be performed on the product image by using an optical character recognition technique, so as to obtain first text data. Specifically, the character shape in the product image is determined through an optical character recognition technology, and then the character shape is translated into computer characters according to a character recognition method commonly used in the optical character recognition technology, so that first text data corresponding to the product image is obtained.
In step S202 of some embodiments, first, word segmentation processing is performed on product description data to obtain product description words. And then, marking the parts of speech of the product description words to obtain the parts of speech type of the product description words. Further, part-of-speech reduction is carried out on the product description words according to the part-of-speech type, and reduction description words are obtained. And finally, carrying out semantic recognition on the restored description words based on the named entity recognition model to obtain semantic content data of the restored description words, and integrating all the semantic content data to obtain second text data corresponding to the product description data. The named entity recognition model is a model constructed based on a long-term and short-term memory algorithm.
In step S203 of some embodiments, the first text data and the second text data are integrated to obtain semantic content data. Specifically, the text data representing the same semantic content in the first text data and the second text data are combined first to realize duplication elimination of the text data. And integrating the text data representing different semantic contents in the de-duplicated text data, the first text data and the second text data to obtain complete text data, and taking the complete text data as semantic content data.
The first text data representing the product information and the image text information can be conveniently extracted from the product image through the steps S201 to S203, content recognition can be conveniently carried out on the product description data, semantic understanding of the product description data is achieved, second text data is obtained, further, the first text data and the second text data are integrated into semantic content data, and comprehensiveness and accuracy of analysis of description information in recommendation data can be improved.
Referring to fig. 3, in some embodiments, the process of performing legal word detection on semantic content data to obtain content detection data may include, but is not limited to, steps S301 to S304:
Step S301, word detection is carried out on semantic content data based on a preset target word, and a first word consistent with the target word in the semantic content data is obtained, wherein the target word is an illegal word;
step S302, product content detection is carried out on semantic content data based on preset candidate product content, and a first difference part between the semantic content data and the candidate product content is obtained, wherein the candidate product content is legal product content;
step S303, carrying out sentence detection on semantic content data based on a preset target description sentence to obtain a first sentence consistent with the target description sentence in the semantic content data, wherein the target description sentence is an illegal description sentence;
step S304, obtaining content detection data based on the first word, the first difference part and the first sentence, wherein the content detection data is used for indicating illegal words in the semantic content data.
The following describes step S301 to step S304 in detail.
In step S301 of some embodiments, the target word is an illegal word, is a word that is unsuitable for appearance, such as an exaggerated word or the like. Specifically, firstly, word segmentation processing is carried out on semantic content data to obtain a plurality of semantic terms. Then, comparing each semantic term with a preset target term, and calculating the semantic similarity of the semantic term and the target term through a similarity algorithm; if the semantic similarity is higher than a preset first threshold, determining that the semantic terms are consistent with the target terms, and taking the semantic terms consistent with the target terms as first terms. Wherein the first word is a word which needs to be deleted or adjusted in the semantic content data. For example, the target word includes the words "highest", "first", "most", and the like.
In step S302 of some embodiments, for the comparison type content existing in the semantic content data, detection of peer-to-peer comparison is required, that is, whether there is a large difference between the comparison type content and the related content of the applied scene, whether there is confusion of product content, and the like is detected. Specifically, firstly, acquiring related product content of a scene to which a product in recommendation data is applied; next, the relevant product content is regarded as candidate product content. Further, the semantic content data is compared with the candidate product content, and a first difference part between the semantic content data and the candidate product content is screened out, wherein the first difference part is a content which has larger difference with the related content of the applied scene in the semantic content data and is possibly mixed with the product content. For example, the first difference portion may be a benefit value that describes an error in the benefit comparison process, and so on.
In step S303 of some embodiments, the target description sentence is an illegal description sentence including illegal promise, functional description contents. Specifically, firstly, semantic content data is split according to punctuation marks and common grammar specifications to obtain a plurality of semantic sentences. Then, comparing the semantic sentences with preset target description sentences, and calculating the semantic similarity of the semantic sentences and the target description sentences through a similarity algorithm; if the semantic similarity is higher than a preset second threshold value, determining that the semantic sentence is consistent with the target description sentence, and taking the semantic sentence consistent with the target description sentence as a first sentence. Wherein the first sentence is a sentence that needs to be deleted or adjusted in the semantic content data.
In step S304 of some embodiments, the first word, the first difference portion, and the first sentence are integrated into the same set, and all the words and sentences in the set are used as content detection data, where the content detection data is used to indicate illegal words existing in the semantic content data.
Through the steps S301 to S304, illegal words and sentences in the semantic content data can be detected more conveniently, legal word detection of recommended data is realized, and accuracy and comprehensiveness of content detection are improved.
Referring to fig. 4, in some embodiments, the process of performing label omission detection on semantic content data to obtain content detection data may include, but is not limited to, steps S401 to S404:
step S401, performing risk word detection on semantic content data to obtain risk words;
step S402, performing main body detection on semantic content data to obtain a second word for describing a product main body, a third word for describing a service main body and a fourth word for describing a mechanism main body;
step S403, performing data source detection on the semantic content data to obtain a fifth word for describing the source of the recommendation data;
And step S404, labeling and detecting the risk words, the second words, the third words, the fourth words and the fifth words to obtain content detection data.
The following describes step S401 to step S404 in detail.
In step S401 of some embodiments, a risk dictionary is preset, where the risk dictionary includes a plurality of reference risk words, where the reference risk words in the risk dictionary may be collected according to the detection process in all the time periods before the current detection. Specifically, traversing the risk dictionary, comparing the words in the semantic content data with the reference risk words in the risk dictionary, and taking the words which are the same as the reference risk words as risk words if the words which are the same as the reference risk words exist in the semantic content data.
In the insurance field, the recommended data of each insurance product can be subjected to risk word detection, or the recommended data of the insurance product with higher risk level can be used as the data required to be subjected to risk word detection according to the risk level of the insurance product, without limitation.
In step S402 of some embodiments, entity feature extraction is performed on semantic content data through a named entity recognition model to obtain semantic entity features corresponding to each semantic term in the semantic content data. Then, a preset classification function is utilized to detect the first probability that each semantic entity feature belongs to the product subject category, the second probability that each semantic entity feature belongs to the service subject category and the third probability that each semantic entity feature belongs to the mechanism subject category. If the first probability is larger than a preset first probability threshold, determining that the semantic terms corresponding to the semantic entity features are terms describing the product main body, and taking the semantic terms as second terms. And if the second probability is larger than a preset second probability threshold, determining that the semantic terms corresponding to the semantic entity features are terms describing the business body, and taking the semantic terms as third terms. If the third probability is larger than a preset third probability threshold, determining that the semantic terms corresponding to the semantic entity features are terms describing the mechanism main body, and taking the semantic terms as fourth terms.
It should be noted that the preset classification function may be a softmax function, etc., without limitation.
In step S403 of some embodiments, the semantic content data is subjected to data source detection by using a preset semantic recognition model, words describing the data source in the semantic content data are detected on a word level, and the words describing the data source are used as fifth words. Wherein. The preset semantic recognition model may be a model built based on the BERT model or a transformer encoder.
In step S404 of some embodiments, the risk word, the second word, the third word, the fourth word, and the fifth word may be labeled by a common data labeling platform, and whether the risk word, the second word, the third word, the fourth word, and the fifth word are labeled is detected, so as to obtain content detection data, where the content detection data is used to indicate unlabeled risk word, the second word, the third word, the fourth word, and the fifth word.
Through the steps S401 to S404, the words to be marked in the semantic content data and whether the words to be marked exist or not can be detected conveniently, the missing detection of the marks of the recommended data is realized, and the accuracy and the comprehensiveness of the content detection are improved.
Referring to fig. 5, in some embodiments, the process of performing accurate word detection on semantic content data to obtain content detection data may include, but is not limited to, steps S501 to S504:
step S501, carrying out grammar detection on semantic content data based on a preset grammar rule to obtain a grammar detection result, wherein the grammar detection result is used for indicating a content part with at least one of logic errors and grammar errors in the semantic content data;
step S502, extracting a product description sentence in semantic content data;
step S503, comparing the product description sentence with the reference product description of the reference product in the preset product library to obtain a second difference part of the product description sentence and the reference product description;
step S504, obtaining content detection data based on the grammar detection result and the second difference part.
Step S501 to step S504 are described in detail below.
In step S501 of some embodiments, the preset grammar rule may be expressed as different forms of regular expressions, and the preset regular expressions are used to perform grammar detection on the semantic content data, and determine whether sentences and words in the semantic content data conform to the logic and grammar specifications represented by the regular expressions, so as to obtain a grammar detection result. If sentences and words in the semantic content data accord with the logic and grammar specifications represented by the regular expression, the grammar detection result is that no logic errors and grammar errors exist in the semantic content data. If sentences and words in the semantic content data do not conform to the logic and/or grammar specifications represented by the regular expression, the grammar detection result indicates that logic errors or grammar errors exist.
In step S502 of some embodiments, sentence extraction is performed on the semantic content data by a named entity recognition algorithm, so that a product description sentence in the semantic content data is extracted.
In step S503 of some embodiments, the reference product in the preset product library contains a product to be described in the recommended data, and the reference product description is used for description explanation of the reference product. In actual use, the product description content in the recommendation data should be consistent with the reference product description. Based on the above, firstly, according to the products to be recommended in the recommendation data, a reference product consistent with the products to be recommended is found in a preset product library, then, the product description sentence is compared with the reference product description of the reference product in the preset product library, and a second difference part of the product description sentence and the reference product description is screened out, wherein the second difference part is a part where the product description sentence and the reference product description have larger difference and possibly have product content description errors.
In step S504 of some embodiments, the grammar detection result and the second difference portion are integrated into the same set, and all words and sentences of the set are used as content detection data, where the content detection data is used to indicate a content portion in which at least one of a logic error or a grammar error exists in the semantic content data, and a portion inconsistent with the reference product description of the reference product.
Through the steps S501 to S504, sentences with logic errors or grammar errors in the semantic content data and sentences consistent with the description of the reference products can be conveniently detected, accurate word detection on the recommended data is realized, and the accuracy and the comprehensiveness of content detection are improved.
In step S104 of some embodiments, when the content detection data is subjected to the compliance assessment to obtain compliance problem data, the words and sentences obtained in the processes of the compliance word detection, the labeling omission detection and the accurate word detection are classified according to the detection process to obtain a first data set corresponding to the compliance word detection, a second data set corresponding to the labeling omission detection, and a third data set corresponding to the accurate word detection. And then, respectively marking the first data set, the second data set and the third data set with the compliance problem to obtain the compliance problem data.
For example, a compliance question corresponding to a first word in a first data set is labeled as "xxx is an illegal word", a compliance question corresponding to a first difference portion in the first data set is labeled as "xxxx is not consistent with the candidate product content description", and a compliance question corresponding to a first sentence in the first data set is labeled as "xxx is an illegal sentence".
Referring to fig. 6, in some embodiments, step S105 includes, but is not limited to, steps S601 to S605:
step S601, extracting target compliance problems in compliance problem data;
step S602, screening candidate strategies in a preset mapping table based on target compliance problems, wherein the preset mapping table is used for representing the corresponding relation between each candidate compliance problem and the candidate strategy;
step S603, performing problem similarity scoring on the target compliance problem and the candidate compliance problem to obtain problem scoring data;
step S604, selecting a candidate strategy corresponding to a candidate compliance problem with the problem scoring data larger than a preset threshold as a target strategy;
step S605 generates optimization suggestion data based on the target policy.
Step S601 to step S605 are described in detail below.
In step S601 of some embodiments, the compliance question of the compliance question data may be directly extracted, and the extracted compliance question may be used as the target compliance question. For example, target compliance question a is "xxx is an unlabeled term," target compliance question B is "xxx is an unlabeled risk term," and so on.
In step S602 of some embodiments, the preset mapping table includes a plurality of candidate compliance questions and candidate policies corresponding to each candidate compliance question, and the preset mapping table is used to characterize a correspondence between each candidate compliance question and the candidate policies. For example, the candidate compliance question C in the preset mapping table is "there is an unlabeled risk word", and the corresponding candidate policy C is "the unlabeled risk word is labeled with a red font". The candidate compliance question D is "there are illegal words", and the corresponding candidate policy D is "delete illegal words".
Specifically, when screening candidate strategies in a preset mapping table based on target compliance problems, comparing the target compliance problems with the candidate compliance problems, and screening candidate strategies matched with the target compliance problems according to comparison conditions.
In step S603 of some embodiments, a preset similarity algorithm is used to score the similarity of the target compliance question and the candidate compliance question, and the calculated similarity is used as question scoring data. Among them, the similarity algorithm includes, but is not limited to, cosine similarity algorithm, euclidean distance, and the like.
In step S604 of some embodiments, the greater the question score data, the closer the content of the questions characterized by the candidate compliance questions and the target compliance questions are, and therefore, candidate strategies corresponding to the candidate compliance questions with the question score data greater than the preset threshold are selected as the target strategies.
In step S605 of some embodiments, the target policy and the target compliance problem are integrated to obtain a complete text data, and the text data is used as the optimization suggestion data. For example, the optimization suggestion data includes a target compliance question "xxx is an illegal word" and the corresponding target policy is "delete xxx".
The target compliance problems of the recommended data and the target strategies corresponding to the target compliance problems can be determined more conveniently through the steps S601 to S605, so that the target compliance problems in the recommended data can be solved according to the target strategies. By generating the optimization suggestion data according to the target strategy, an effective solution can be provided for optimizing the suggestion data, and the optimization efficiency and the optimization accuracy of the suggestion data are improved.
Referring to fig. 7, after step S105 of some embodiments, the risk detection method of the data may include, but is not limited to, steps S701 to S704:
step S701, transmitting the optimization suggestion data to a target object, wherein the target object is an object providing the recommendation data;
step S702, receiving updated data fed back by a target object, wherein the updated data is data obtained by optimizing recommended data according to optimization suggestion data by the target object;
step S703, performing compliance detection on the updated data to obtain a compliance detection result, where the compliance detection result is used to characterize whether the updated data has a compliance problem;
and step S704, if the compliance detection result indicates that the update data has no compliance problem, the update data is sent to an auditing end for auditing.
The following describes the above steps S701 to S704 in detail.
In step S701 of some embodiments, the optimization suggestion data may be transmitted to the target object by means of wired or wireless communication. Specifically, the optimization suggestion data is sent to the target object in the form of a notification message, a popup prompt, a mail, or the like. The target object is an object for providing recommendation data, and the object may be an agent in each service area, for example, an insurance agent, etc., without limitation.
In step S702 of some embodiments, when the target object receives the optimization suggestion data, optimization processing may be performed on the recommendation data according to the optimization suggestion data, where the optimization processing includes content deletion, content augmentation, content annotation, format adjustment, grammar adjustment, and the like on the recommendation data, so as to obtain updated data. When the target object completes optimization of the recommended data, after the updated data is obtained, the updated data is required to be uploaded, so that the system can receive the updated data fed back by the target object.
In step S703 of some embodiments, the implementation process of performing compliance detection on the update data to obtain a compliance detection result is similar to the implementation process of steps S102 to S104 described above. For the sake of space saving, the description is omitted. Specifically, when the update data is subjected to the compliance detection, and the compliance problem data is not obtained, the compliance detection result is determined that the update data has no compliance problem. And when the compliance detection is carried out on the updated data to obtain compliance problem data, determining the compliance detection result as that the updated data has the compliance problem.
In step S704 of some embodiments, if the compliance detection result indicates that the update data has no compliance problem, the update data is sent to an auditing end for auditing, and an auditing person or an auditing system at the auditing end reviews the update data, and if the review is correct, the update data is published to the disclosure platform. If the rechecking is wrong, generating rechecking opinions, and feeding the rechecking opinions back to the target object so that the target object modifies the updated data.
In addition, if the compliance detection result indicates that the update data has a compliance problem, step S105 is executed, and after the execution is completed, steps S701 to S704 are continuously executed, and the process is repeated until the checking end checks without errors, and the final data is issued to the disclosure platform.
Through the steps S701 to S704, the target object can optimize the recommended data according to the optimization suggestion data, so that the updated data obtained after optimization meets the compliance requirement; when the problem that the updated data is not in compliance is detected, the updated data is sent to the auditing end for review, a plurality of auditing links are arranged to detect whether the updated data is in compliance, the risk detection accuracy of the data can be improved, the data finally issued to the public platform can meet the compliance requirement, and the safety and the compliance of the data are improved.
According to the risk detection method for the data, recommendation data are obtained, wherein the recommendation data comprise product images and product description data; and carrying out content analysis based on the product image and the product description data to obtain semantic content data, and realizing semantic understanding of recommended data by combining the image and text data. Further, content detection is carried out on semantic content data to obtain content detection data; wherein the content detection includes at least one of: legal word detection, labeling omission detection and accurate word detection can realize multi-dimensional content detection of semantic content data, and the comprehensiveness and diversity of content detection are improved. Further, carrying out compliance evaluation on the content detection data to obtain compliance problem data; the optimization suggestion data is generated based on the compliance problem data, so that the accuracy of compliance assessment can be improved, the accuracy of the generated optimization suggestion data is improved, intelligent compliance detection of the recommendation data is realized, and the accuracy of data compliance detection and the efficiency of data compliance detection are improved.
Referring to fig. 8, an embodiment of the present application further provides a risk detection device for data, which may implement a risk detection method for the data, where the device includes:
A data acquisition module 801, configured to acquire recommendation data, where the recommendation data includes a product image and product description data;
a content analysis module 802, configured to perform content analysis based on the product image and the product description data, to obtain semantic content data;
a content detection module 803, configured to perform content detection on the semantic content data to obtain content detection data; wherein the content detection includes at least one of: detecting legal words, marking missing detection and accurately detecting words;
the compliance evaluation module 804 is configured to perform compliance evaluation on the content detection data to obtain compliance problem data;
the advice generation module 805 is configured to generate optimization advice data based on the compliance problem data.
The specific implementation manner of the risk detection device of the data is basically the same as the specific embodiment of the risk detection method of the data, and is not described herein again.
The embodiment of the application also provides electronic equipment, which comprises: the risk detection system comprises a memory, a processor, a program stored in the memory and capable of running on the processor, and a data bus for realizing connection communication between the processor and the memory, wherein the program is executed by the processor to realize the risk detection method of the data. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.
Referring to fig. 9, fig. 9 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:
the processor 901 may be implemented by a general purpose CPU (central processing unit), a microprocessor, an application specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application;
the memory 902 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM). The memory 902 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present application are implemented by software or firmware, relevant program codes are stored in the memory 902, and the processor 901 invokes a risk detection method for executing data in the embodiments of the present application;
an input/output interface 903 for inputting and outputting information;
the communication interface 904 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);
A bus 905 that transfers information between the various components of the device (e.g., the processor 901, the memory 902, the input/output interface 903, and the communication interface 904);
wherein the processor 901, the memory 902, the input/output interface 903 and the communication interface 904 are communicatively coupled to each other within the device via a bus 905.
The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to realize the risk detection method of the data.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The data risk detection method, the data risk detection device, the electronic equipment and the computer readable storage medium provided by the embodiment of the application are used for acquiring recommended data, wherein the recommended data comprises product images and product description data; and carrying out content analysis based on the product image and the product description data to obtain semantic content data, and realizing semantic understanding of recommended data by combining the image and text data. Further, content detection is carried out on semantic content data to obtain content detection data; wherein the content detection includes at least one of: legal word detection, labeling omission detection and accurate word detection can realize multi-dimensional content detection of semantic content data, and the comprehensiveness and diversity of content detection are improved. Further, carrying out compliance evaluation on the content detection data to obtain compliance problem data; the optimization suggestion data is generated based on the compliance problem data, so that the accuracy of compliance assessment can be improved, the accuracy of the generated optimization suggestion data is improved, intelligent compliance detection of the recommendation data is realized, and the accuracy of data compliance detection and the efficiency of data compliance detection are improved.
The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
It will be appreciated by those skilled in the art that the solutions shown in fig. 1-7 are not limiting to embodiments of the present application and may include more or fewer steps than shown, or certain steps may be combined, or different steps.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.
Preferred embodiments of the present application are described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims (10)

1. A method of risk detection of data, the method comprising:
acquiring recommendation data, wherein the recommendation data comprises a product image and product description data;
content analysis is carried out based on the product image and the product description data, so that semantic content data are obtained;
performing content detection on the semantic content data to obtain content detection data; wherein the content detection includes at least one of: detecting legal words, marking missing detection and accurately detecting words;
performing compliance evaluation on the content detection data to obtain compliance problem data;
generating optimization suggestion data based on the compliance problem data.
2. The risk detection method according to claim 1, wherein the performing content detection on the semantic content data to obtain content detection data includes:
performing word detection on the semantic content data based on a preset target word to obtain a first word consistent with the target word in the semantic content data, wherein the target word is an illegal word;
product content detection is carried out on the semantic content data based on preset candidate product content, and a first difference part between the semantic content data and the candidate product content is obtained, wherein the candidate product content is legal product content;
Sentence detection is carried out on the semantic content data based on a preset target description sentence, so that a first sentence consistent with the target description sentence in the semantic content data is obtained, wherein the target description sentence is an illegal description sentence;
and obtaining the content detection data based on the first word, the first difference part and the first sentence, wherein the content detection data is used for indicating illegal words existing in the semantic content data.
3. The risk detection method according to claim 1, wherein the performing content detection on the semantic content data to obtain content detection data includes:
detecting risk words of the semantic content data to obtain risk words;
performing main body detection on the semantic content data to obtain a second word for describing a product main body, a third word for describing a service main body and a fourth word for describing a mechanism main body;
detecting the data source of the semantic content data to obtain a fifth word for describing the source of the recommendation data;
and labeling and detecting the risk words, the second words, the third words, the fourth words and the fifth words to obtain the content detection data, wherein the content detection data are used for indicating the risk words, the second words, the third words, the fourth words and the fifth words which are not labeled.
4. The risk detection method according to claim 1, wherein the performing content detection on the semantic content data to obtain content detection data includes:
carrying out grammar detection on the semantic content data based on a preset grammar rule to obtain a grammar detection result, wherein the grammar detection result is used for indicating a content part with at least one of logic errors and grammar errors in the semantic content data;
extracting a product description sentence in the semantic content data;
comparing the product description sentence with a reference product description of a reference product in a preset product library to obtain a second difference part of the product description sentence and the reference product description;
and obtaining the content detection data based on the grammar detection result and the second difference part.
5. The risk detection method of claim 1, wherein the generating optimization suggestion data based on the compliance issue data comprises:
extracting a target compliance problem in the compliance problem data;
screening candidate strategies in a preset mapping table based on the target compliance problem, wherein the preset mapping table is used for representing the corresponding relation between each candidate compliance problem and the candidate strategy;
Performing problem similarity scoring on the target compliance problem and the candidate compliance problem to obtain problem scoring data;
selecting a candidate strategy corresponding to the candidate compliance problem with the problem scoring data larger than a preset threshold as a target strategy;
and generating the optimization suggestion data based on the target strategy.
6. The risk detection method of claim 1, wherein after the generating optimization suggestion data based on the compliance issue data, the method further comprises:
transmitting the optimization suggestion data to a target object, wherein the target object is an object for providing the suggestion data;
receiving updated data fed back by the target object, wherein the updated data is data obtained by optimizing the recommended data by the target object according to the optimized recommended data;
performing compliance detection on the updated data to obtain a compliance detection result, wherein the compliance detection result is used for representing whether the updated data has a compliance problem or not;
and if the compliance detection result indicates that the update data has no compliance problem, the update data is sent to an auditing end for auditing.
7. The risk detection method according to any one of claims 1 to 6, wherein the content analysis based on the product image and the product description data, to obtain semantic content data, includes:
Performing text detection on the product image to obtain first text data;
performing content detection on the product description data to obtain second text data;
and obtaining the semantic content data based on the first text data and the second text data.
8. A risk detection apparatus for data, the apparatus comprising:
the data acquisition module is used for acquiring recommended data, wherein the recommended data comprises a product image and product description data;
the content analysis module is used for carrying out content analysis based on the product image and the product description data to obtain semantic content data;
the content detection module is used for detecting the content of the semantic content data to obtain content detection data; wherein the content detection includes at least one of: detecting legal words, marking missing detection and accurately detecting words;
the compliance evaluation module is used for carrying out compliance evaluation on the content detection data to obtain compliance problem data;
and the suggestion generation module is used for generating optimization suggestion data based on the compliance problem data.
9. An electronic device comprising a memory storing a computer program and a processor implementing the risk detection method of data according to any of claims 1 to 7 when the computer program is executed by the processor.
10. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the risk detection method of data according to any one of claims 1 to 7.
CN202311311873.XA 2023-10-11 2023-10-11 Data risk detection method and device, electronic equipment and storage medium Pending CN117372164A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311311873.XA CN117372164A (en) 2023-10-11 2023-10-11 Data risk detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311311873.XA CN117372164A (en) 2023-10-11 2023-10-11 Data risk detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117372164A true CN117372164A (en) 2024-01-09

Family

ID=89399666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311311873.XA Pending CN117372164A (en) 2023-10-11 2023-10-11 Data risk detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117372164A (en)

Similar Documents

Publication Publication Date Title
CN113704428A (en) Intelligent inquiry method, device, electronic equipment and storage medium
CN115394393A (en) Intelligent diagnosis and treatment data processing method and device, electronic equipment and storage medium
CN116050352A (en) Text encoding method and device, computer equipment and storage medium
CN116701604A (en) Question and answer corpus construction method and device, question and answer method, equipment and medium
CN116844731A (en) Disease classification method, disease classification device, electronic device, and storage medium
CN117033796A (en) Intelligent reply method, device, equipment and medium based on user expression preference
CN116956925A (en) Electronic medical record named entity identification method and device, electronic equipment and storage medium
CN116469546A (en) Disease auxiliary identification method, device, equipment and medium based on attention mechanism
CN114398903B (en) Intention recognition method, device, electronic equipment and storage medium
CN116719683A (en) Abnormality detection method, abnormality detection device, electronic apparatus, and storage medium
CN115795007A (en) Intelligent question-answering method, intelligent question-answering device, electronic equipment and storage medium
CN115270746A (en) Question sample generation method and device, electronic equipment and storage medium
CN115062705A (en) Data checking method, data checking device, electronic equipment and storage medium
CN117372164A (en) Data risk detection method and device, electronic equipment and storage medium
CN117592456A (en) Text quality detection method and device, electronic equipment and storage medium
CN116934502A (en) Intelligent verification method, intelligent verification device, electronic equipment and storage medium
CN116702782A (en) Text processing method, text processing device, electronic equipment and storage medium
CN118132691A (en) Text construction method and device, electronic equipment and storage medium
CN116702743A (en) Text similarity detection method and device, electronic equipment and storage medium
CN116757177A (en) Text similarity detection method and device, electronic equipment and storage medium
CN116467455A (en) Emotion recognition method, emotion recognition device, electronic device, and storage medium
CN117034940A (en) Nuclear questionnaire generation method and device, electronic equipment and storage medium
CN116595998A (en) Corpus updating method and device, electronic equipment and storage medium
CN117390147A (en) Intelligent reply method and device, electronic equipment and storage medium
CN117435106A (en) Page generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination