CN113822521B - Method, device and storage medium for detecting quality of question library questions - Google Patents

Method, device and storage medium for detecting quality of question library questions Download PDF

Info

Publication number
CN113822521B
CN113822521B CN202110663603.XA CN202110663603A CN113822521B CN 113822521 B CN113822521 B CN 113822521B CN 202110663603 A CN202110663603 A CN 202110663603A CN 113822521 B CN113822521 B CN 113822521B
Authority
CN
China
Prior art keywords
detected
determining
watermark
question
questions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110663603.XA
Other languages
Chinese (zh)
Other versions
CN113822521A (en
Inventor
朱群
马景林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Cloud Computing Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Cloud Computing Beijing Co Ltd filed Critical Tencent Cloud Computing Beijing Co Ltd
Priority to CN202110663603.XA priority Critical patent/CN113822521B/en
Publication of CN113822521A publication Critical patent/CN113822521A/en
Application granted granted Critical
Publication of CN113822521B publication Critical patent/CN113822521B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Mathematical Physics (AREA)
  • Operations Research (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Fuzzy Systems (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method, a device and a storage medium for detecting the quality of questions in a question bank, wherein the method comprises the following steps: acquiring questions to be detected in a target question bank, and determining whether target values exist in objects to be detected in the questions to be detected and/or whether a question type structure of the questions to be detected belongs to a preset question type structure so as to obtain first detection data of the questions to be detected; determining watermark confidence of an image carried in the topic to be detected, and determining second detection data of the topic to be detected according to the watermark confidence and a preset watermark threshold range of the topic to be detected; and determining a quality score of the topic to be detected based on the first detection data and the second detection data, and determining the topic quality of the topic to be detected based on the quality score. The application can improve the detection efficiency of quality detection, and has simple operation and high reliability.

Description

Method, device and storage medium for detecting quality of question library questions
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for detecting quality of questions in a question bank, and a storage medium.
Background
With the vigorous development of the mobile internet, intelligent education is also gradually in the brand-new corner, and the intelligent education is an education method for adaptively providing users with learning based on user data and a large amount of question bank information. At present, a large amount of question information is arranged at the bottom layer of intelligent education, so that a set of method for detecting the quality of the education question bank is needed while a large amount of education question banks are constructed, and therefore the quality of the question bank is ensured not to be affected by a large amount of unqualified questions.
The inventor of the application finds that in the research and practice process, the current intelligent education question bank has no set of automatic detection method for the quality of questions in the question bank, and the quality of the questions can be detected only by a spot check mode along with the rapid increase of the number of the questions, but the spot check mode can miss some questions with problems so as to bring incomplete question detection. Therefore, a large amount of human intervention is needed in the conventional question quality detection of the question library, the quality detection condition is related to the knowledge surface corresponding to human force, the detection efficiency is low, and the reliability is low.
Disclosure of Invention
The embodiment of the application provides a quality detection method, a device and a storage medium for question library questions, which can improve the detection efficiency of quality detection, and are simple to operate and high in reliability.
In one aspect, the embodiment of the application provides a method for detecting the quality of questions in a question bank, which comprises the following steps:
Acquiring a subject to be detected in a target subject library;
Determining whether a target value exists in an object to be detected in the questions to be detected and/or whether a question type structure of the questions to be detected belongs to a preset question type structure so as to obtain first detection data of the questions to be detected, wherein the object to be detected comprises at least one of a question stem, an answer, an option, a formula and a character;
determining watermark confidence of an image carried in the topic to be detected, and determining second detection data of the topic to be detected according to the watermark confidence and a preset watermark threshold range of the topic to be detected;
And determining a quality score of the title to be detected based on the first detection data and the second detection data, and determining the title quality of the title to be detected based on the quality score.
With reference to the first aspect, in one possible implementation manner, the object to be detected includes a stem, an answer and/or an option, and the target value includes a null value and/or a repeat value; determining whether a target value exists in an object to be detected in the questions to be detected to obtain first detection data of the questions to be detected includes:
determining detection rule parameters of objects to be detected in the questions to be detected, wherein the detection rule parameters comprise a length threshold of the objects to be detected and/or a quantity threshold of the objects to be detected;
Traversing the stems, answers and options in the questions to be detected;
If the length of at least one object to be detected in the question to be detected, the question stem, the answer and the options is smaller than or equal to the length threshold value, determining that the first detection data is null value in the object to be detected; and/or
And if the number of the at least one object to be detected is greater than or equal to the number threshold, determining that the first detection data is that the repeated value exists in the object to be detected.
With reference to the first aspect, in one possible implementation manner, the object to be detected includes a formula, and the target value is a missing value; determining whether a target value exists in an object to be detected in the questions to be detected to obtain first detection data of the questions to be detected includes:
Detecting target symbols of formulas in the questions to be detected, and if the target symbols are not matched, determining that the first detection data are missing values in the objects to be detected; or alternatively
Rendering the formula in the subject to be detected, and if the formula fails to be rendered, determining that the first detection data is a missing value in the object to be detected.
With reference to the first aspect, in one possible implementation manner, the object to be detected includes a character, and the target value is a messy code value; determining whether a target value exists in an object to be detected in the questions to be detected to obtain first detection data of the questions to be detected includes:
and determining the coding range of the characters in the title to be detected, and if the coding range of the characters belongs to the designated coding range, determining the first detection data as a messy code value stored in the object to be detected.
With reference to the first aspect, in one possible implementation manner, determining whether a topic structure of a topic to be detected belongs to a preset topic structure to obtain first detection data of the topic to be detected includes:
determining the feature codes of the topic type structure of the topic to be detected;
if the feature code of the question type structure of the question to be detected is different from the feature code of the preset question type structure, determining that the first detection data is that the question type structure of the question to be detected does not belong to the preset question type structure.
With reference to the first aspect, in one possible implementation manner, determining a watermark confidence of an image carried in a subject to be detected includes:
acquiring an image carried in a subject to be detected;
If the acquired images carried in the questions to be detected are empty, determining the watermark confidence of the images carried in the questions to be detected as 0;
If the image carried in the subject to be detected is not empty, determining the image type of the target image carried in the subject to be detected, and determining the watermark confidence of the target image according to the image type of the target image to obtain the watermark confidence of the image carried in the subject to be detected, wherein the image type of the target image comprises a static picture and/or a video frame picture.
With reference to the first aspect, in one possible implementation manner, determining the watermark confidence of the target image according to the image type of the target image includes:
If the target image carried in the subject to be detected is a static image, determining a color layer of the static image;
If the color layer of the static picture comprises a white layer and a black layer and any color layer exists between the white layer and the black layer, determining the watermark confidence coefficient 1 of an image carried in a subject to be detected;
if the color layer of the static picture comprises a white layer and a black layer, and no color layer exists between the white layer and the black layer, edge detection is carried out on the target image to obtain a first edge characteristic of the target image, and the watermark confidence of the image carried in the subject to be detected is determined based on the first edge characteristic of the target image.
With reference to the first aspect, in one possible implementation manner, determining a watermark confidence of an image carried in a subject to be detected based on a first edge feature of a target image includes:
Acquiring a second edge characteristic of the watermark template, and determining a first matching degree threshold value for determining watermark confidence degree of the image;
if the matching degree of the first edge feature and the second edge feature is greater than or equal to a first matching degree threshold value, determining that the watermark confidence degree of the image carried in the subject to be detected is 1;
If the matching degree of the first edge feature and the second edge feature is smaller than the first matching degree threshold, determining that the watermark confidence degree of the image carried in the subject to be detected is 0.
With reference to the first aspect, in one possible implementation manner, determining the watermark confidence of the target image according to the image type of the target image includes:
If the target image carried in the subject to be detected is a video frame picture, determining a target detection area of the video frame picture in the video to which the target image belongs;
And determining a partial image corresponding to the target detection area from the target image, inputting the partial image into a target watermark recognition model, and outputting watermark confidence of the partial image based on the target watermark recognition model to obtain the watermark confidence of the target image.
With reference to the first aspect, in one possible implementation manner, determining second detection data of the to-be-detected question according to the watermark confidence and a preset watermark threshold range of the to-be-detected question includes:
If the watermark confidence coefficient is larger than or equal to the maximum threshold value of the preset watermark threshold value range, determining that the second detection data carries the watermark in the target image;
If the watermark confidence is greater than or equal to the minimum threshold of the preset watermark threshold range and is smaller than the maximum threshold of the preset watermark threshold range, determining whether watermark keywords used for detecting whether the target image has watermarks exist or not, and determining second detection data based on the watermark keywords and the matching degree of text data extracted from the target image;
if the watermark confidence is smaller than the minimum threshold value of the preset watermark threshold value range, determining that the second detection data does not carry the watermark in the target image.
With reference to the first aspect, in one possible implementation manner, determining the second detection data based on the matching degree of the watermark keyword and the text data extracted from the target image includes:
extracting text data from a target image;
If the matching degree of the text data and the watermark key words is greater than or equal to the matching degree threshold value, determining that the second detection data carries the watermark in the target image;
If the matching degree of the text data and the watermark key words is smaller than the matching degree threshold value, determining that the second detection data does not carry the watermark in the target image.
With reference to the first aspect, in one possible implementation manner, determining a quality score of the topic to be detected based on the first detection data and the second detection data includes:
Determining an initial quality score for the topic to be detected;
determining a first quality score of the questions to be detected according to whether target values exist in the objects to be detected in the first detection data and/or whether the question type structures of the questions to be detected belong to preset question type structures and the initial quality score;
And determining a second quality score of the questions to be detected according to whether the target image in the second detection data carries the watermark or not and the first quality score, and determining the quality score of the questions to be detected based on the second quality score.
In one aspect, the present application provides a device for detecting quality of questions in a question bank, where the device includes:
the first acquisition module is used for acquiring the questions to be detected in the target question bank;
the first determining module is used for determining whether a target value exists in an object to be detected in the questions to be detected and/or whether a question type structure of the questions to be detected belongs to a preset question type structure so as to obtain first detection data of the questions to be detected, and the object to be detected comprises at least one of a question stem, an answer, an option, a formula and a character;
The second determining module is used for determining the watermark confidence coefficient of the image carried in the question to be detected and determining second detection data of the question to be detected according to the watermark confidence coefficient and a preset watermark threshold range of the question to be detected;
And the third determining module is used for determining the quality score of the to-be-detected question based on the first detection data and the second detection data and determining the question quality of the to-be-detected question based on the quality score.
With reference to the second aspect, in one possible implementation manner, the object to be detected includes a stem, an answer and/or an option, and the target value includes a null value and/or a repeated value; the first determining module includes:
the first determining unit is used for determining detection rule parameters of the objects to be detected in the questions to be detected, wherein the detection rule parameters comprise a length threshold value of the objects to be detected and/or a quantity threshold value of the objects to be detected;
the traversing unit is used for traversing the question stems, answers and options in the questions to be detected;
The second determining unit is configured to determine that the first detection data is null if the length of at least one to-be-detected object of the questions to be detected, the answers, and the options is less than or equal to the length threshold; and/or
And the third determining unit is used for determining that the first detection data is the repeated value existing in the object to be detected if the number of the at least one object to be detected is greater than or equal to the number threshold value.
With reference to the second aspect, in one possible implementation manner, the object to be detected includes a formula, and the target value is a missing value; the first determining module includes:
The detection pairing unit is used for detecting target symbols of formulas in the questions to be detected, and if the target symbols are not paired, determining that the first detection data are missing values in the objects to be detected; or alternatively
The rendering determining unit is used for rendering the formula in the subject to be detected, and if the formula fails to be rendered, determining that the first detection data is a missing value in the object to be detected.
With reference to the second aspect, in one possible implementation manner, the object to be detected includes a character, and the target value is a messy code value; the first determining module includes:
The messy code value determining unit is used for determining the coding range of the characters in the subject to be detected, and if the coding range of the characters belongs to the appointed coding range, determining that the first detection data is the messy code value stored in the object to be detected.
With reference to the second aspect, in one possible implementation manner, the first determining module includes:
the feature code determining unit is used for determining the feature code of the question type structure of the questions to be detected;
The problem type structure determining unit is used for determining that the first detection data is that the problem type structure of the problem to be detected does not belong to the preset problem type structure if the feature code of the problem type structure of the problem to be detected is different from the feature code of the preset problem type structure.
With reference to the second aspect, in one possible implementation manner, the second determining module includes:
The first acquisition unit is used for acquiring the image carried in the subject to be detected;
A fourth determining unit, configured to determine, if the acquired image carried in the to-be-detected question is empty, a watermark confidence coefficient of the image carried in the to-be-detected question to be 0;
And a fifth determining unit, configured to determine an image type of a target image carried in the to-be-detected question if the obtained image carried in the to-be-detected question is not empty, and determine a watermark confidence level of the target image according to the image type of the target image, so as to obtain the watermark confidence level of the image carried in the to-be-detected question, where the image type of the target image includes a still picture and/or a video frame picture.
With reference to the second aspect, in one possible implementation manner, the second determining module includes:
A sixth determining unit, configured to determine a color layer of the still picture if the target image carried in the subject to be detected is the still picture;
a seventh determining unit, configured to determine that the watermark confidence of the image carried in the subject to be detected is 1 if the color layer of the still picture includes a white layer and a black layer, and any color layer exists between the white layer and the black layer;
and an eighth determining unit, configured to, if the color layer of the still picture includes a white layer and a black layer, and no color layer exists between the white layer and the black layer, perform edge detection on the target image to obtain a first edge feature of the target image, and determine, based on the first edge feature of the target image, a watermark confidence of an image carried in the subject to be detected.
With reference to the second aspect, in one possible implementation manner, the eighth determining unit includes:
The first acquisition subunit is used for acquiring the second edge characteristic of the watermark template and determining a first matching degree threshold value for determining the watermark confidence degree of the image;
A first determining subunit, configured to determine that the watermark confidence of the image carried in the subject to be detected is 1 if the matching degree of the first edge feature and the second edge feature is greater than or equal to the first matching degree threshold;
And the second determining subunit is configured to determine that the watermark confidence of the image carried in the subject to be detected is 0 if the matching degree of the first edge feature and the second edge feature is smaller than the first matching degree threshold.
With reference to the second aspect, in a possible implementation manner, the fifth determining unit includes:
The first acquisition subunit is used for acquiring the second edge characteristic of the watermark template and determining a first matching degree threshold value for determining the watermark confidence degree of the image;
A third determining subunit, configured to determine that the watermark confidence of the image carried in the subject to be detected is 1 if the matching degree of the first edge feature and the second edge feature is greater than or equal to the first matching degree threshold;
And a fourth determining subunit, configured to determine that the watermark confidence of the image carried in the topic to be detected is 0 if the matching degree of the first edge feature and the second edge feature is smaller than the first matching degree threshold.
With reference to the second aspect, in a possible implementation manner, the fifth determining unit further includes:
A fifth determining subunit, configured to determine, if the target image carried in the subject to be detected is a video frame, a target detection area of a video frame in a video to which the target image belongs;
And a sixth determining subunit, configured to determine a partial image corresponding to the target detection area from the target image, input the partial image into a target watermark identification model, and output watermark confidence of the partial image based on the target watermark identification model to obtain watermark confidence of the target image.
With reference to the second aspect, in one possible implementation manner, the second determining module includes:
A ninth determining unit, configured to determine that the second detection data carries a watermark in the target image if the watermark confidence coefficient is greater than or equal to a maximum threshold value of the preset watermark threshold value range;
a tenth determining unit, configured to determine whether a watermark keyword for detecting whether a watermark exists in the target image if the watermark confidence is greater than or equal to a minimum threshold value of the preset watermark threshold value range and less than a maximum threshold value of the preset watermark threshold value range, and determine the second detection data based on the watermark keyword and a matching degree of text data extracted from the target image;
And an eleventh determining unit, configured to determine that the second detection data does not carry a watermark in the target image if the watermark confidence coefficient is smaller than a minimum threshold value of the preset watermark threshold value range.
With reference to the second aspect, in one possible implementation manner, the tenth determining unit includes:
a first extraction subunit, configured to extract text data from the target image;
A seventh determining subunit, configured to determine that the second detection data carries a watermark in the target image if the matching degree between the text data and the watermark keyword is greater than or equal to a matching degree threshold;
and an eighth determining subunit, configured to determine that the second detection data does not carry a watermark in the target image if the matching degree between the text data and the watermark keyword is less than the matching degree threshold.
With reference to the second aspect, in one possible implementation manner, the third determining module includes:
a twelfth determining unit, configured to determine an initial quality score of the topic to be detected;
A thirteenth determining unit, configured to determine a first quality score of the to-be-detected question according to whether a target value exists in the to-be-detected object in the first detection data and/or whether a question type structure of the to-be-detected question belongs to a preset question type structure, and the initial quality score;
a fourteenth determining unit, configured to determine, if the second detection data indicates that the target image carries a watermark, and the first quality score determines a second quality score of the question to be detected;
And a fifteenth determining unit, configured to determine a quality score of the topic to be detected based on the second quality score.
In one aspect, the application provides a computer device comprising: a processor, a memory, a network interface;
The processor is connected to a memory for providing data communication functions, a network interface for storing a computer program, and for invoking the computer program to perform the method according to the above aspect of the embodiments of the application.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the terminal authentication method provided in various optional manners of the above aspect.
In the application, first, a target value and/or whether a question type structure of a to-be-detected question belongs to a preset question type structure is determined by acquiring a to-be-detected question in a target question library, and then, first detection data of the to-be-detected question is obtained, wherein the to-be-detected question comprises at least one of a question stem, an answer, an option, a formula and a character. And then determining the watermark confidence of the image carried in the topic to be detected, and determining second detection data of the topic to be detected according to the watermark confidence and a preset watermark threshold range of the topic to be detected. And finally, determining the quality score of the to-be-detected question based on the first detection data and the second detection data, and determining the question quality of the to-be-detected question based on the quality score. The quality detection mode of the question bank questions can greatly reduce manpower, and can also carry out frequent quality detection on the question bank with large data magnitude, thereby improving the efficiency and reliability of quality detection.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a scenario of a method for detecting quality of a question library according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating a method for detecting quality of a question library according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a table of feature codes of a preset question type structure according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a user interface when traversing an object to be detected according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a detection framework for a target question bank according to an embodiment of the present application;
FIG. 7 is a schematic diagram of another detection framework for a target question bank according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a quality inspection rule of a subject to be inspected according to an embodiment of the present application;
FIG. 9 is a diagram of a user interface for a subject to be detected according to an embodiment of the present application;
FIG. 10 is a schematic diagram of another flow chart of a method for detecting quality of questions in a question bank according to an embodiment of the present application;
Fig. 11 is a schematic diagram of a watermark judging flow of a video frame according to an embodiment of the present application;
FIG. 12 is a schematic diagram of a device for detecting quality of a question library according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the application are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The application provides a quality detection method for questions in a question bank, which belongs to Computer Vision technology (CV) and machine learning (MACHINE LEARNING, ML) under the field of artificial intelligence. The computer vision is a science for researching how to make a machine "see", and more specifically, a camera and a computer are used to replace human eyes to identify, track and measure targets, and the like, and further, graphic processing is performed, so that the computer is processed into images which are more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, optical character recognition (Optical Character Recognition, OCR), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, three-dimensional (Three Dimensional, which may be abbreviated as 3D) techniques, virtual reality, augmented reality, synchronous positioning, and map construction, and the like, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and the like. Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present application. As shown in fig. 1, the network architecture may include a service server 100 and a user terminal cluster, where the user terminal cluster may include a user terminal 10a, a user terminal 10b, …, and a user terminal 10n, where a communication connection may exist between the user terminal clusters, for example, a communication connection exists between the user terminal 10a and the user terminal 10b, a communication connection exists between the user terminal 10b and the user terminal 10n, and any user terminal in the user terminal cluster may exist with the service server 100, for example, a communication connection exists between the user terminal 10a and the service server 100, and a communication connection exists between the user terminal 10b and the service server 100.
The user terminal clusters (including the user terminal 10a, the user terminal 10b, and the user terminal 10 n) may be integrated with the target application. Alternatively, the target application may include an application having a function of displaying data information such as text, image, and video. For example, the target application may be a topic quality detection application, which may be used for uploading a plurality of topic databases (e.g., preschool education topic database, yasi topic database, high-number topic database, etc.) by a user and detecting topic quality of a topic to be detected in the target topic database (e.g., preschool education topic database). Or the target application can be an online education application, and can be used for acquiring a target question bank to be online, detecting the question quality of a question to be detected in the target question bank, if the question quality in the target question bank is detected to be of a qualified level, carrying out online processing on the target question bank for a user, and if the question quality in the target question bank is detected to be of a unqualified level, suspending the online operation of the target question bank. The service server 100 in the present application may collect service data such as images or videos uploaded by the applications, and optionally, the service data may include questions to be detected in a target question bank uploaded by the user. For convenience of explanation, the questions to be detected in the target question bank are directly taken as business data for illustration. The service server 100 may determine whether a target value exists in an object to be detected in the object library and/or whether a question type structure of the object to be detected belongs to a preset question type structure, so as to obtain first detection data of the object to be detected, where the object to be detected includes at least one of a question stem, an answer, an option, a formula and a character. The service server 100 then determines the watermark confidence of the image carried in the topic to be detected, and determines the second detection data of the topic to be detected according to the watermark confidence and the preset watermark threshold range of the topic to be detected. And finally, the service server 100 determines a quality score of the to-be-detected question based on the first detection data and the second detection data, determines a question quality of the to-be-detected question based on the quality score, and returns the question quality of the to-be-detected question to the user terminal. Optionally, the user terminal may be any user terminal selected from the user terminal group in the embodiment corresponding to fig. 1, for example, the user terminal may be the user terminal 10b, and the user may view the question quality of the to-be-detected question in the target question bank on the display page of the user terminal 10 b.
It will be appreciated that the method provided in the embodiment of the present application may be performed by a computer device, which includes but is not limited to a terminal or a server, and the service server 100 in the embodiment of the present application may be a computer device, and user terminals in a user terminal cluster may also be computer devices, which is not limited herein. The service server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content distribution network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligence platform. The terminal may include: smart terminals carrying image recognition functions such as smart phones, tablet computers, notebook computers, desktop computers, smart televisions, smart speakers, desktop computers, smart watches, and the like, but are not limited thereto. The user terminal and the service server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
Referring to fig. 2, fig. 2 is a schematic diagram of a scenario of a method for detecting quality of a question library according to an embodiment of the present application. As shown in fig. 2, when using a target application (e.g., a topic quality detection application) in the user terminal, the user a uploads a number of topic pools (e.g., preschool education topic pool, yasi topic pool, high number topic pool, etc.) through the user terminal 10b, and then the user terminal 10b sends a quality detection request for a topic to be detected in the target topic pool (e.g., high number topic pool) to the service server 100. Specifically, the service server 100 may detect and obtain a question to be detected in a target question library uploaded by a user, and determine whether a target value exists in an object to be detected (i.e. text data) in the question to be detected and/or whether a question type structure of the question to be detected belongs to a preset question type structure, so as to obtain first detection data of the question to be detected, where the object to be detected includes at least one of a question stem, an answer, an option, a formula, and a character. The service server 100 then determines the watermark confidence of the image carried in the topic to be detected, and determines the second detection data of the topic to be detected according to the watermark confidence and the preset watermark threshold range of the topic to be detected. And finally, determining the quality score of the to-be-detected question based on the first detection data and the second detection data, and determining the question quality of the to-be-detected question based on the quality score. Optionally, the service server 100 may return the detection result of the topic quality of the topic to be detected to the user terminal 10b, and then the user a may look up the detection result of the topic quality of the target topic library (e.g. high-number topic library) on the display page of the user terminal 10 b.
Further, for easy understanding, please refer to fig. 3, fig. 3 is a flow chart of a method for detecting quality of questions in a question bank according to an embodiment of the present application. The method may be performed by a user terminal (e.g., the user terminal shown in fig. 1 or fig. 2 described above), or may be performed by a user terminal and a service server (e.g., the service server 100 in the embodiment corresponding to fig. 1 or fig. 2 described above) together. For easy understanding, this embodiment will be described by taking the method performed by the above-described user terminal as an example. The quality detection method of the question library questions at least comprises the following steps S101-S104:
s101, obtaining the questions to be detected in the target question bank.
In some possible implementations, the topics to be detected in the target topic library are obtained. The optional questions to be detected in the target question bank can be provided by a third party (such as an education platform), can be obtained by analyzing Word documents (documents related to the target question bank), or can be obtained by converting characters of other printed matters such as various books, manuscripts and the like into image information through scanning and other optical input modes based on optical character recognition (Optical Character Recognition, OCR), and then the image information is converted into usable computer input by utilizing a character recognition technology so as to obtain the questions to be detected of the target question bank. The target question bank may be preschool education question bank, yasi question bank, high-number question bank, etc., and the questions to be detected may include objects to be detected existing in text data and images carried in the questions to be detected.
S102, determining whether a target value exists in an object to be detected in the questions to be detected and/or whether a question type structure of the questions to be detected belongs to a preset question type structure so as to obtain first detection data of the questions to be detected.
In some possible embodiments, the object to be detected may include at least one of a stem, an answer, an option, a character, and a formula, and the target value may include at least one of a null value, a repeat value, a missing value, and a messy code value. Under the condition that the object to be detected comprises a stem, an answer and/or an option and the target value comprises a null value and/or a repeat value, determining detection rule parameters of the object to be detected in the object to be detected, wherein the detection rule parameters comprise a length threshold of the object to be detected and/or a quantity threshold of the object to be detected, and the detection rule corresponding to the detection rule parameters can be detection rules such as that the stem is null, the answer is null, and the option is repeated. After the detection rule parameters are determined, the stem, the answer and the options in the questions to be detected can be traversed based on the detection rule parameters. If traversing that the length of at least one object to be detected in the question to be detected, the question stem, the answer and the option is smaller than or equal to the length threshold, determining that the first detection data is null (if the question stem is null or the option is null) in the object to be detected; and/or if the number of at least one object to be detected in the questions, answers and options is greater than or equal to the number threshold, determining that the first detection data is that a repetition value exists in the object to be detected (such as option repetition).
In some possible embodiments, in the case where the object to be detected includes a formula (for example, a latach (Latex) formula or other formulas) and the target value is a missing value, the target symbol of the formula in the subject to be detected may be detected first, and if the target symbol is not paired, it is determined that the first detection data is that the missing value exists in the object to be detected. For convenience of description, a Latex formula will be described below as an example, and since the format of the Latex formula is generally "$ { Latex } $", where $appears in pairs, it is possible to detect the target symbol $ of the Latex formula in the subject to be detected, and if the target symbol $ is not paired, it is determined that the first detection data is a missing value in the subject to be detected, which may indicate that there is a problem in the Latex formula or that the Latex formula is incomplete. Optionally, the formula in the subject to be detected may be rendered, and if the formula fails to be rendered, it is determined that the first detection data is a missing value in the object to be detected. Assuming that the string of the Latex formula is "\sum_ { k=1 } ≡ N k ++2", rendering (i.e. parsing) can be performed on "\sum_ { k=1 } N k ++2" by a Latex formula editor, and if the formula fails to render (i.e. the parsing fails, e.g. the parsing formula is incomplete), determining that the first detection data is a missing value (e.g. a character_) in the object to be detected. It can be understood that in the process of manually editing the formula or cleaning the formula for storage, the situation that some characters (such as characters_) are lost in the character string of the formula can occur, and rendering "\sum { k=1 } N k.
In some possible embodiments, when the object to be detected includes a character and the target value is a scrambling code value, the encoding range of the character in the subject to be detected may be determined according to a unicode standard (a character encoding standard), and if the encoding range of the character belongs to the designated encoding range, the first detection data is determined as the scrambling code value stored in the object to be detected. The coding range corresponding to the disorder code character can be collectively called as a designated coding range. For example, referring to table 1, table 1 is a table of coding ranges of a character according to an embodiment of the present application.
TABLE 1
Character(s) Coding range
Chinese character 4E00~9FA5
English language 0030~0039
Digital number 0041~005A
Private area character E000~F8FF
Special character FFF0~FFFF
It will be appreciated that, as shown in table 1, a type of character may correspond to a coding range, for example, a coding range of 4E 00-9 FA5 for chinese characters, a coding range of 0030-0039 for english, a coding range of 0041-005A, … for numerals, a coding range of E000-F8 FF for private characters, and a coding range of FFF 0-FFFF for special characters. Wherein the encoding range corresponding to the private area character and the encoding range corresponding to the special character belong to a specified encoding range (may also be referred to as a high risk character type encoding range). For example, when it is detected that the encoding range of the character in the subject to be detected belongs to the encoding range corresponding to the private area character and/or the encoding range corresponding to the special character, it may be determined that the first detection data is a messy code value (such as a messy code character) existing in the subject to be detected.
In some possible embodiments, in the process of detecting the topic structure of the topic to be detected, the feature code of the topic structure of the topic to be detected may be determined first. If the feature code of the question type structure of the question to be detected is different from the feature code of the preset question type structure, determining that the first detection data is that the question type structure of the question to be detected does not belong to the preset question type structure, and obtaining that the question type structure of the question to be detected has a problem. The feature code of the preset question type structure can be a white book feature code set (or defined) by a user. For example, referring to fig. 4, fig. 4 is a table diagram of feature codes of a preset question type structure according to an embodiment of the application. For convenience of description, a single choice question will be taken as an example, as shown in fig. 4, the question mark (Identity document, ID) of the single choice question is identified as 1, the target question bank may include N1 first class single choice questions and N2 second class single choice questions, where the ratio of the first class single choice questions in all the single choice questions in the target question bank is M1, and the ratio of the second class single choice questions in all the single choice questions is M2. It can be understood that constituent elements of the first type of single choice questions (such as [ main ] stem, [ main ] option(s), [ main ] answer and [ main ] parse (optional)) can be determined first, wherein [ main ] can be understood as a main stem and its corresponding options, answers and parse, so that the first type of single choice questions can be obtained as a single choice question of a question-question structure. Further, constituent elements of the second type of single choice questions (such as [ main ] stem, [ multi-sub ] option(s), [ multi-sub ] answer, [ multi-sub ] parse (optional) and [ main ] parse (optional)) may be determined, where [ multi-sub ] may be understood as a plurality of sub stems and options, answers and parse corresponding to each sub stem, so that the second type of single choice questions may be obtained as a single choice question with a multi-question structure.
Further, the feature code of the question-one structure may be {51 (2+) 3 (4) } according to the constituent elements of the first type single question, and the feature code of the question-one structure may be {51 (__ {51 (2+) 3 (4) } +) (4) }, where 5 may represent a question, 1 may represent a stem, 2 may represent an option, 3 may represent an answer, 4 may represent parsing, + may represent the number of options (e.g., 1 to n), and the number of parsing (e.g., 0 to n) may be represented according to the constituent elements of the second type single question, and a string following __ may be used to represent a sub-question. At this time, feature codes of the preset topic structure may be determined according to feature codes corresponding to the first type topic and feature codes corresponding to the second type topic, for example, {51 (2+) 3 (4 x) } and {51 (__ {51 (2+) 3 (4 x) } +) (4 x) }. After obtaining the feature codes of the preset topic structures, the feature codes of the topic structures of the topics to be detected (such as single topics) can be determined, and when the feature codes of the topic structures of the topics to be detected are different from {51 (2+) 3 (4 x) } and {51 (__ {51 (2+) 3 (4 x) } +) (4 x) }, the first detection data can be determined that the topic structures of the topics to be detected do not belong to the preset topic structures, i.e. the topic structures of the topics to be detected have problems.
Referring to fig. 5, fig. 5 is a schematic diagram of a user interaction interface when traversing an object to be detected according to an embodiment of the present application. As shown in fig. 5, the quality inspection list in the figure refers to the detection rule that the object to be detected may hit, such as a question type with a blank question stem, a question type with a blank answer, a question type with a repeated option, a question type with a blank option, incomplete question stem formulas, incomplete option formulas, incomplete resolution formulas, disordered question stem characters, and disordered answer characters. The quality inspection list may also include questions resolved as a glance, questions resolved as a null, questions information lost, no answer resolved, incomplete answer formula, question stem as invalid (null), option character mess-up, answer resolved character mess-up, hypertext markup language (Hyper Text Markup Language, HTML), incomplete scrambling, correction of answer format errors, abnormal picture display, and the like. Optionally, when the detection rule corresponding to the detection rule parameter traversing the object to be detected is a question type with a blank question, the modification state of the object to be detected is unrepaired, the examination state is a state including an unremoved state and an examined state, and the on-off state is a state including an unremoved state and an on-shelf state, the object to be detected meeting all conditions can be searched out by traversing all the objects to be detected in the target question bank, and ID identifiers (for marking different objects to be detected in different questions to be detected) meeting all the conditions are displayed in the user interaction page diagram. It can be understood that the modification state here is a state that whether an operator repairs the unqualified questions after the quality detection of the target question bank is completed, the checking state here is a state that whether an operator checks the detection result after the quality detection of the target question bank is completed, and the on-off state here is a state that whether the operator puts on the machine after the quality detection of the target question bank is qualified.
Specifically, referring to fig. 6, fig. 6 is a schematic diagram of a detection framework for a target question bank according to an embodiment of the present application. In an alternative embodiment of the present application, in order to determine whether the target value of the object to be detected (such as a stem, an answer, an option, a formula, and/or a character) appears and whether the question type structure of the question to be detected belongs to a preset question type structure, a detection rule corresponding to the object to be detected needs to be determined first. Optionally, as shown in fig. 6, the question feedback after the user uses the target question library may be obtained from each user side and/or platform side (e.g., the user side and/or platform side logging in the online education application), and then the questions are integrated and displayed. Optionally, a Content management system (Content MANAGEMENT SYSTEM, CMS) may be used to demonstrate questions to the question feedback of the target question bank. The operator of the content management system may then correct the error in the question presentation (i.e., correct the question), and after correcting the question, may generate a detection rule based on the question feedback of the target question bank. The detection rules are gradually increased along with the continuous discovery of the problems and the continuous problem feedback of the user, so that the detection rules can be ensured to cover as many questions to be detected as possible. And when the corresponding detection rule passes the verification, the verification is associated with the computing platform, so that the computing platform can determine the first detection data of the questions to be detected based on the detection rule parameters corresponding to the detection rules. The detection rules can include rules of stem null, answer null, option repetition, formula missing, character messy code, question structure with questions and the like. Optionally, when the detection rule is associated with the computing platform, the detection rule may be configured in the computing platform in a manner of custom rule (i.e. rule codes corresponding to the detection rule are written in the computing platform). In addition, the detection rule can be written in a script (i.e. a combination for determining a series of operation actions performed by a control computer), and provided to the computing platform for use by way of a script plug-in (i.e. a program written by an application program interface conforming to a certain specification, which can only run under a platform specified by the program). Optionally, a Spark SQL engine (i.e. a module in the Spark big data computing engine for processing structured data, which may also be understood as a distributed SQL statement query engine) may be further built, and the detection rules are configured in the Spark SQL engine by means of SQL statements (abbreviated as SQL configuration), so as to traverse the stem, answer, option, formula, character and question structure of the target question library based on the SQL statements in the computing platform and find out the hit detection rules (e.g. stem is empty, answer is empty, option is repeated, formula is missing), Rules such as character disorder, problem existence in the question structure, etc.). As shown in fig. 6, the computing platform may perform regular detection on an object to be detected in the task to be detected and/or perform image detection on an image carried in the task to be detected, store corresponding detection data (first detection data and second detection data), then report a problem of the task to be detected corresponding to the detection data (including accuracy and coverage rate of the task to be detected, etc.) in the content management system, and notify corresponding operators of the corresponding detection data mail, so as to perform subsequent operations on the target task library based on the detection data.
Optionally, in some possible implementations, please refer to fig. 7, fig. 7 is a schematic diagram of another detection framework for the target question bank according to an embodiment of the present application. When a Spark SQL engine is built and is associated with a computing platform to traverse the stem, answer, option, formula, character and question structure of the questions to be detected in the target question bank based on the SQL statement and find out the objects to be detected which aim at the detection rules, the quality detection of the target question bank can be carried out through the detection framework diagram aiming at the target question bank shown in fig. 7. As shown in FIG. 7, in the data entry stage, the topics to be detected of the target topic library stored in the first database are provided to the computing platform for data processing via an application program interface (Application Program Interface, API). The questions to be detected of the target question bank stored in the first database can be directly transferred to the second database for storage after the data is structured, and data can be directly obtained from the second database during data processing. This is because the second database is a Hive file, which can map structured data into a database table and provide a complete SQL statement query function, so that data can be obtained from the second database during the data processing process. In the data processing process, firstly, data adaptation (namely data formatting) is needed to be performed on the questions to be detected, which are called by using an API interface, and the related data of the questions to be detected, which are stored in the first database, are converted into a unified data format so as to facilitate subsequent operation. And then, configuring service, namely writing the detection rule into SQL sentences to be configured in the Spark SQL engine, and detecting the quality of the questions to be detected based on the quality detection service in the Spark SQL engine. Through configuration service, on one hand, quality detection can be carried out on stock data in the first database and the second database, and meanwhile, quality detection can also be carried out on data stored in the database in future, so that the stock data and real-time data can share one set of detection rules and the same set of Spark SQL engine, and the development cost and the configuration cost of SQL are reduced. When data is input through the API interface, if the input amount of the data exceeds the load of data processing at a certain moment, the data to be input needs to be stored in a message queue (such as Kafka, i.e. a high throughput distributed publish-subscribe message system) to be queued for data processing. When the data related to the to-be-detected subject acquired from the first database or the second database enters a data processing stage, rule calculation can be performed through a Flink SQL (a data stream processing framework), namely, SQL sentences written based on the detection rules are used for calculating the data related to the to-be-detected subject. In addition, all data after data adaptation, configuration service, quality inspection service, SQL statement writing and message queue processing, data after the calculation of the Flink SQL framework and data after the processing of the Spark SQL engine are stored in a third database (such as Mysql), so that all intermediate data (including first detection data, second detection data and the like) in a data processing stage can be obtained from the third database when the statistical service is used, and the intermediate data is stored and displayed in a data output stage. As shown in fig. 7, in the data processing process, the object to be detected in the question to be detected and the question type structure of the question to be detected may be detected based on the data adaptation, the configuration service, the quality inspection service, the SQL statement, the message queue, the rule calculation, and the corresponding function implemented by the Spark SQL engine, and whether the watermark exists in the image carried in the question to be detected may be judged based on the corresponding function and the algorithm service.
Alternatively, in some possible embodiments, quality detection may be performed on the object to be detected in the question to be detected and the topic structure of the question to be detected by different rule engines (e.g., grule rule engines, cloud functions, and Drools rule engines). Because the question types in the target question bank are rich and the corresponding quality inspection rules are rich and thin, and the quality inspection rules need to be frequently added and modified, the rule engine needs to meet the quality inspection rules of the new addition and modification to be effective quickly, the low-code effect is realized, and the quality inspection requirement of the development code does not need to be issued. The low code effect here can be understood as: aiming at different disciplines and question types, the effect of increasing the question quality inspection rule can be realized simply by configuring the rule or dragging the rule component. To meet the quality detection requirements, grule rule engines, cloud functions, and Drools rule engines may be analyzed to obtain: grule the rule engine is a rule engine based on golang technology stack, which is consistent with the current technology stack, however, grule rule engine is also in an iterative stage and has insufficient maturity, and has limited supporting capability, such as being incapable of supporting analysis of complex structures, not supporting loop and function definition, and the like. The cloud function has rich supporting capability, but takes 200-300ms, and the problem of slow starting also occurs, and can only be called in a mode of timing, gateway and the like. The Drools rule engine is a rule engine based on a java technology stack, can be widely applied to development of different business applications, is low in calculation time consumption and rich in rule language, supports circulation and function definition, and can be seamlessly integrated with spark. Therefore, in order to achieve the quality detection effects of quick effectiveness, low time consumption and rich support of the quality detection rules, the quality detection rules can be quickly issued by a Drools rule engine for the objects to be detected in the questions to be detected and the question structures of the questions to be detected, and the atomic codes developed by free combination (namely, low code assembly is realized) can be used for quality detection of the questions to be detected.
Further, referring to fig. 8, fig. 8 is a schematic diagram of a quality inspection rule of a subject to be inspected according to an embodiment of the present application. For convenience of description, the description will be given below taking the to-be-detected question as a six-choice five-question type (i.e. a newly added question type with 6 options and 5 answers), as shown in fig. 8, it may be determined that the to-be-detected question is a single-choice question, and the question type structure of the to-be-detected question is a one-question multi-question structure, and it is determined that constituent elements of the to-be-detected question include a question stem, a common option, a plurality of continuous single-choice questions, and an analysis, where constituent elements such as the question stem, the common option, and the plurality of continuous single-choice questions may not be null, and the analysis may be null. Further, the method can map the stem, the common options, the plurality of continuous single-choice questions and the analysis of the questions to be detected respectively to obtain text materials corresponding to the stem, select each option content of the questions, and have 6 options altogether, and 5 continuous single-choice questions under the stem and the whole analysis. At this time, according to the text material corresponding to the subject, 6 options are provided for each option content of the selected subject, 5 continuous single-choice subjects are provided under the subject, and the overall analysis is performed, a quality inspection rule corresponding to the subject to be detected is generated. Wherein, this quality control rule includes: the questions contain a common option, the number of common options is 6 and the number of answers is 5, the stem of each question is empty, and each question must contain an answer. Further, the quality inspection rule can be written into a plurality of different Drools atomic codes to be configured in a Drools rule engine, and the object to be inspected in the object to be inspected and the question type structure of the object to be inspected are subjected to rule inspection based on the functions realized by the atomic codes in the Drools rule engine, so that first inspection data of the object to be inspected are obtained. Among the various different Drools atomic codes may be: the questions contain a common option, the number of the common options is 6, the number of the answers is 5, the stem of each small question is empty, and each small question must contain the answers and other atomic codes. Optionally, the Drools rule engine can be used for configuring and rapidly releasing the quality inspection rules corresponding to different topics in the target topic library, and quality inspection is performed on the different topics based on the quality inspection rules corresponding to the different topics, so that more complex quality inspection is satisfied, and the detection efficiency is higher.
S103, determining the watermark confidence of the image carried in the topic to be detected.
In some possible embodiments, the watermark confidence of the image carried in the topic to be detected may be determined. Optionally, when determining the watermark confidence of the image carried in the to-be-detected question, the image carried in the to-be-detected question may be acquired first, and if the acquired image carried in the to-be-detected question is empty, the watermark confidence of the image carried in the to-be-detected question is determined to be 0. If the obtained image carried in the subject to be detected is not empty, determining the image type of the target image carried in the subject to be detected, and determining the watermark confidence of the target image according to the image type of the target image to obtain the watermark confidence of the image carried in the subject to be detected, wherein the image type of the target image can comprise a still picture and/or a video frame picture. The watermark confidence degree is the reliability degree of the watermark in the still picture and/or the video frame picture, and the watermark confidence degree is 0 and can be used for indicating that the watermark is not present, and the watermark confidence degree is 1 and can be used for indicating that the watermark is present because a large number of pictures and teaching videos of learning resources exist in the topics in the topic library at present and the watermark exists in the pictures and the videos. In an optional embodiment of the present application, if the target image carried in the subject to be detected is a still image, further watermark confidence determination may be performed by determining a color layer of the still image. The method is characterized in that the static picture carried in the current subject to be detected is mainly based on white background and black characters, and the watermark is arranged between a black layer and a white layer, so that most static pictures with the watermark (namely, watermark confidence is 1) can be detected by screening the determination of the color layer of the static picture. Optionally, if the color layer of the still picture includes a white color layer and a black color layer, and any color layer exists between the white color layer and the black color layer, it may be determined that the watermark confidence of the image carried in the subject to be detected is 1. Since the RPG value of the white layer is #000000 and the RPG value of the black layer is # FFFFFF, if any color RPG value exists between the white layer and the black layer, the watermark confidence of the image carried in the subject to be detected can be determined to be 1. If the color layer of the still picture includes a white layer and a black layer, but no color layer exists between the white layer and the black layer, the edge detection may be further performed on the target image to obtain a first edge feature of the target image, and the watermark confidence of the image carried in the subject to be detected may be determined based on the first edge feature of the target image. In an alternative embodiment of the present application, when determining the watermark confidence of the image carried in the topic to be detected based on the first edge feature of the target image, the second edge feature of the watermark template may be obtained first, and a first matching degree threshold for determining the watermark confidence of the target image may be determined. If the matching degree of the first edge feature and the second edge feature is greater than or equal to the first matching degree threshold, determining that the watermark confidence degree of the image carried in the subject to be detected is 1, that is, that the watermark exists in the image carried in the subject to be detected. If the matching degree of the first edge feature and the second edge feature is smaller than the first matching degree threshold, determining that the watermark confidence degree of the image carried in the subject to be detected is 0, that is, that the watermark does not exist in the image carried in the subject to be detected.
Specifically, when determining the watermark confidence of the image carried in the subject to be detected through the first edge feature of the target image, the second edge feature of the watermark template may be acquired first. Alternatively, for obtaining the second edge feature of the watermark template, the template features of various watermark templates may be analyzed first, and then the gradient median of the watermark image of each watermark template may be calculated. Because the gradient of an image exists mainly at the position of the edge of the image, and the edge generally refers to the region where the intensity of the image changes drastically (such as gray level change and spatial change) locally, the appearance and shape of a local object in an image can be well described by the gradient or edge characteristics. Therefore, after the gradient median of the watermark image of each watermark template is calculated, the watermark shape after the watermark templates are mixed can be fitted according to the gradient median of each watermark image, and the image can say that the fitting can be a series of points (each point represents the corresponding gradient median) on a plane are connected by a smooth curve. When the iterative computation is carried out on N pictures, the gradient median of the watermark image tends to converge and reaches stability in a certain interval, so that the fitted watermark can be obtained after the iterative computation is carried out on the N watermark pictures. And then carrying out gradient calculation on each pixel (namely an indivisible unit or element in the whole image) in the watermark image of each watermark template in the X and Y directions, and estimating the watermark value of the watermark image of each watermark template according to the obtained gradient value. The fitted watermark is then input to a Canny edge detector (i.e. a technique which extracts useful structural information from different visual objects and greatly reduces the amount of data to be processed, and is currently widely used in various computer vision systems) and the edge features of the watermark image of each watermark template, i.e. the second edge features of the watermark image of each watermark template, are derived based on the watermark values. After the second edge feature of the watermark template is obtained, watermark position matching can be performed on the target image based on the second edge feature, and first, a first matching degree threshold for determining watermark confidence of the target image can be determined. For example, when the first matching degree threshold is determined to be 95%, if the matching degree between the first edge feature and the second edge feature is greater than or equal to 95%, determining that the watermark confidence of the image carried in the subject to be detected is 1. If the matching degree of the first edge feature and the second edge feature is smaller than 95%, determining that the watermark confidence degree of the image carried in the subject to be detected is 0. Specifically, a chamfer distance in image processing may be used as a matching method of the first edge feature and the second edge feature. The chamfer distance is a distance transformation of an image, that is, a binary image (which means that the gray level of the image is only two, that is, the gray value of any pixel point in the image is 0 or 255, which respectively represents black and white) of a target image is converted into a gray image (an image with only one sampling color per pixel). The gray value of a certain point in the gray image (refer to the color depth of the point in the black-and-white image, which generally ranges from 0 to 255, white is 255, and black is 0) is the distance between the corresponding coordinate point of the original binary image and the nearest target, so the transformed image is the so-called distance image (i.e. the target image with the first edge feature). Since a target image represented as a binary image is found in a matching manner in another binary image (i.e., a watermark image with the second edge feature), the target image cannot be matched if there is local distortion. And a binary image is converted into a distance image for searching, so that the watermark image matched with the binary image can be more easily searched, and the accuracy of watermark matching is improved.
S104, determining second detection data of the questions to be detected according to the watermark confidence and a preset watermark threshold range of the questions to be detected.
In some possible embodiments, the second detection data of the questions to be detected may be determined based on the determined watermark confidence of the image carried in each question to be detected and the preset watermark threshold range of the questions to be detected. Optionally, if the watermark confidence coefficient is greater than or equal to the maximum threshold value of the preset watermark threshold value range, determining that the second detection data carries the watermark in the target image. And if the watermark confidence is smaller than the minimum threshold value of the preset watermark threshold value range, determining that the second detection data does not carry the watermark in the target image. If the watermark confidence is greater than or equal to the minimum threshold of the preset watermark threshold range and is smaller than the maximum threshold of the preset watermark threshold range, determining whether a watermark keyword for detecting whether the target image has a watermark exists or not, and determining the second detection data based on the watermark keyword and the matching degree of text data extracted from the target image. Optionally, when determining the second detection data based on the matching degree of the watermark keyword and the text data extracted from the target image, the text data may be extracted from the target image first, and if the matching degree of the text data and the watermark keyword is greater than or equal to a matching degree threshold, it is determined that the second detection data carries a watermark in the target image. And if the matching degree of the text data and the watermark key words is smaller than the matching degree threshold value, determining that the second detection data does not carry the watermark in the target image.
Specifically, if the preset watermark threshold range of the subject to be detected is determined to be [0.5,0.9], when the watermark confidence is greater than or equal to 0.9, determining the second detection data as that the target image carries a watermark; and when the watermark confidence is smaller than 0.5, determining that the second detection data does not carry the watermark in the target image. When the watermark confidence is greater than or equal to 0.5 and less than 0.9, the target image is further detected based on an optical character recognition (Optical Character Recognition, OCR) technology, and characters in any scene picture can be recognized by the OCR technology, so that characters carried in the target image can be recognized based on the OCR technology. And then, determining watermark keywords (such as science fiction net, soft cloud and the like) carried in the watermark templates and a matching degree threshold value of the text data extracted from the target image and the watermark keywords. Alternatively, the matching degree threshold may be determined to be 90%, so that if the matching degree between the text data extracted from the target image and the watermark keyword is greater than or equal to 90%, the second detection data is determined to be that the target image carries a watermark. And if the matching degree between the text data extracted from the target image and the watermark keyword is smaller than 90%, determining that the second detection data does not carry watermark in the target image.
S105, determining a quality score of the to-be-detected title based on the first detection data and the second detection data, and determining the title quality of the to-be-detected title based on the quality score.
In some possible embodiments, a quality score of the topic to be detected may be determined based on the first detection data and the second detection data, and a topic quality of the topic to be detected may be determined based on the quality score. Optionally, an initial quality score of the to-be-detected question may be determined first, if the first detection data is that the target value exists in the to-be-detected object and/or the question type structure of the to-be-detected question belongs to a preset question type structure, the quality score of the to-be-detected question is reduced, and a first quality score after the reduced score is determined based on the initial quality score. If the second detection data is that the target image carries a watermark, the quality score of the to-be-detected question may be reduced to determine a second quality score with a reduced score based on the first quality score, and the quality score of the to-be-detected question may be determined based on the second quality score. Optionally, the quality of the target question bank may be determined based on the quality score of the questions to be detected, and an alarm prompt may be sent to a background operator based on the quality of the questions.
Alternatively, in some possible embodiments of the present application, the initial quality score of the subject to be detected may be determined to be 100 points, and if the first detection data is that the target value exists in the subject to be detected and/or the problem structure of the subject to be detected belongs to a preset problem structure, the score is subtracted by 10 points based on the 100 points, so that the first quality score after the score is reduced may be determined to be 90 points based on the initial quality score. If the second detection data is that the target image carries the watermark, the score of 5 is subtracted from the score of 90, so that the second quality score after the score is reduced is determined to be 85 based on the first quality score, and the second quality score of 85 is determined to be the quality score of the subject to be detected. If the first detection data is that the target value does not exist in the object to be detected and/or the problem type structure of the problem to be detected does not belong to the preset problem type structure, the first quality score of the problem to be detected is 100 points. If the second detection data is that the target image carries the watermark, subtracting 5 points from the 100 points, so that the second quality score after the score is reduced is determined to be 95 points based on the first quality score, and the second quality score 95 points are determined to be the quality score of the subject to be detected. If the first detection data is that the target value exists in the object to be detected and/or the problem type structure of the problem to be detected belongs to the preset problem type structure, subtracting 10 points from the 100 points, so that the first quality score after the score is reduced can be determined to be 90 points based on the initial quality score. If the second detection data is that the target image does not carry the watermark, determining that the second quality score after the score is reduced is 90 scores based on the first quality score, and determining that the second quality score 90 scores are the quality scores of the topics to be detected. Alternatively, the questions to be detected with the quality score below 90 minutes may be determined as unqualified questions, when the accuracy rate of the target question bank (the number of unqualified questions to be detected/the number of all questions to be detected in the target question bank) is below 90%, the target question bank is considered as a question bank with unqualified question quality, and an alarm prompt for the target question bank is sent to a background operator, and then the operator selects whether to repair the unqualified questions in the target question bank. When the accuracy of the target question bank is greater than or equal to 90%, the question quality of the target question bank can be judged to be qualified.
Further, referring to fig. 9, fig. 9 is a schematic diagram of a user interface for a subject to be detected according to an embodiment of the present application. As shown in FIG. 9, the target question bank may include a junior middle school question bank, wherein the junior middle school question bank may include question banks of Chinese, math, english, physical, chemical, biological, historical, geographic, and moral and legal subjects. For convenience of description, a mathematical question library will be described below as an example, and the questions to be detected after quality detection can be queried according to the question sources (such as year, region and type), the knowledge points related to the questions and the question identifiers (such as 4951902), and the question stems, the questions, the auditing state, the on-off state and the synchronization state of the questions to be detected are displayed on the user interface, so that the questions to be detected can be obtained as solution questions, the auditing state is not examined, the on-off state is not on-off, and the synchronization state is not synchronized. The checking state is a state that whether an operator checks a detection result after the quality detection of the target question bank is finished, the on-off state is a state that whether the operator puts the target question bank on shelf after the quality detection of the target question bank is qualified, and the synchronizing state is a state that whether the operator synchronizes the target question bank after the quality detection of the target question bank is qualified. Further, the examination, the putting on shelf and the synchronization can be performed on the questions to be detected after the quality detection, and the state of the questions to be detected is updated (if the examination state is examined, the putting on and putting off state is put on shelf and the synchronization state is synchronized), so that the quality detection of the questions to be detected does not need to be repeated. Optionally, after quality detection is performed on all the topics in the database, batch auditing, batch loading (or batch unloading) and batch synchronization can be performed on all the topics in the database, so that repeated quality detection on the database is avoided, and the quality detection efficiency is further improved.
In the application, first, a target value and/or whether a question type structure of a to-be-detected question belongs to a preset question type structure is determined by acquiring a to-be-detected question in a target question library, and then, first detection data of the to-be-detected question is obtained, wherein the to-be-detected question comprises at least one of a question stem, an answer, an option formula and a character. And then determining the watermark confidence of the image carried in the topic to be detected, and determining second detection data of the topic to be detected according to the watermark confidence and a preset watermark threshold range of the topic to be detected. And finally, determining the quality score of the to-be-detected question based on the first detection data and the second detection data, and determining the question quality of the to-be-detected question based on the quality score. The quality detection mode of the question bank questions can greatly reduce manpower, and can also carry out frequent quality detection on the question bank with large data magnitude, thereby improving the efficiency and reliability of quality detection.
Further, for easy understanding, please refer to fig. 10, fig. 10 is another flow chart of the method for detecting quality of questions in a question bank according to an embodiment of the present application. The method may be performed by a user terminal (e.g., the user terminal shown in fig. 1 or fig. 2 described above), or may be performed by a user terminal and a service server (e.g., the service server 100 in the embodiment corresponding to fig. 1 or fig. 2 described above) together. For easy understanding, this embodiment will be described by taking the method performed by the above-described user terminal as an example. The method for detecting the quality of the question library at least comprises the following steps S201 to S205:
S201, obtaining the questions to be detected in the target question bank.
The specific implementation of step S201 may be referred to the description of step S101 in the corresponding embodiment of fig. 3, and will not be repeated here.
S202, determining whether a target value exists in an object to be detected in the questions to be detected and/or whether a question type structure of the questions to be detected belongs to a preset question type structure so as to obtain first detection data of the questions to be detected.
The specific implementation of step S202 may be referred to the description of step S102 in the corresponding embodiment of fig. 3, and will not be repeated here.
S203, acquiring the image carried in the subject to be detected, determining the image type of the target image carried in the subject to be detected, and determining the watermark confidence of the target image according to the image type of the target image.
In some possible embodiments, the image carried in the subject to be detected is obtained, the image type of the target image carried in the subject to be detected is determined, and then the watermark confidence of the target image is determined according to the image type of the target image. In an alternative embodiment of the present application, if the image type of the target image is a video frame, determining a target detection area of the video frame in the video to which the target image belongs, determining a partial image corresponding to the target detection area from the target image, inputting the partial image into a target watermark recognition model, and outputting watermark confidence of the partial image based on the target watermark recognition model to obtain watermark confidence of the target image.
Specifically, referring to fig. 11, fig. 11 is a schematic diagram of a watermark determining process of a video frame according to an embodiment of the present application. As shown in fig. 11, in some teaching video resources of the subject to be detected, the redundancy (similarity) between adjacent video frames is high, so it is not very realistic to extract and store features for all video frames, and the watermark usually exists in the whole process or appears in the tail of the video, so in an alternative embodiment of the present application, the target detection areas of the video frame pictures in the video to which the target image belongs, namely, the first, middle and tail video frame pictures in the video, are determined first. For the first, middle and last video frame pictures in the video, partial images (namely, upper right, lower right, upper left, lower left, middle right and the like in each video frame picture) corresponding to the target detection area are determined, then the partial images are input into a target watermark recognition model, and watermark confidence of the partial images is output based on the target watermark recognition model so as to obtain watermark confidence of the target images. The target watermark recognition model is a model with optimal accuracy and recall rate in at least two watermark recognition models to be selected, and optionally, the watermark recognition models to be selected can be InceptionV models and Resnet models. The InceptionV model is a neural network for implementing a large-scale visual recognition task of ImageNet (a computer visual system recognition project), the Resnet model is a depth residual network for performing image processing by using a convolutional neural network, and 18 refers to 18 layers with weights, which include a convolutional layer and a fully-connected layer. In training the watermark recognition model, the watermark recognition model is first focused on difficult samples in the sample data set using difficult-to-sample mining (Online Hard Example Mining, OHEM), where difficult-to-sample mining refers to retraining samples that result in a large loss value (even if the model is highly likely to be misclassified) during model training. The difficult samples herein refer to those samples which are difficult to distinguish when training the original normal sample data set, and a sample set composed by synthesizing similar samples. And simultaneously inputting the normal sample and the difficult sample into the InceptionV model and the Resnet model, calculating recall rate, accuracy and time consumption of the two models, and selecting one model with high comprehensive performance as a target watermark identification model. And then outputting the watermark confidence of the partial image based on the target watermark identification model to obtain the watermark confidence of the target image. As shown in fig. 10, when the watermark confidence is greater than or equal to 0.9, determining that the target image carries a watermark according to the second detection data; and when the watermark confidence is smaller than 0.5, determining that the second detection data does not carry the watermark in the target image. When the watermark confidence is greater than or equal to 0.5 and less than 0.9, the target image is further detected based on an optical character recognition (Optical Character Recognition, OCR) technology, and characters in any scene picture can be recognized by the OCR technology, so that characters carried in the target image can be recognized based on the OCR technology. And then, determining watermark keywords (such as science fiction net, soft cloud and the like) carried in the watermark templates and a matching degree threshold value of the text data extracted from the target image and the watermark keywords. Alternatively, the matching degree threshold may be determined to be 90%, so that if the matching degree between the text data extracted from the target image and the watermark keyword is greater than or equal to 90%, the second detection data is determined to be that the target image carries a watermark. And if the matching degree between the text data extracted from the target image and the watermark keyword is smaller than 90%, determining that the second detection data does not carry watermark in the target image. Optionally, since there are special interfaces (such as propaganda watermarks and propagandas of other education institutions) in the tail video frame picture of the video in a large amount of teaching video resources, the watermark identification for the target detection area is detected by cutting corners (such as upper right, lower right, upper left, lower left and middle right corners) of the picture, so that watermarks at other positions (such as the center of the image of the target detection area) are omitted. OCR detection of the full view of the tail video frame is therefore also added in an alternative embodiment of the application to get all text in the target detection area. And if the matching degree of all texts in the target detection area and the watermark keywords carried in each watermark template is larger than the matching degree threshold, determining that the target image carries the watermark as hit keyword rules.
S204, determining second detection data of the questions to be detected according to the watermark confidence and a preset watermark threshold range of the questions to be detected.
The specific implementation of step S204 may be referred to the description of step S104 in the corresponding embodiment of fig. 3, and will not be repeated here.
S205, determining a quality score of the topic to be detected based on the first detection data and the second detection data, and determining the topic quality of the topic to be detected based on the quality score.
The specific implementation of step S205 may be referred to the description of step S105 in the corresponding embodiment of fig. 3, and will not be repeated here.
In the application, first, a target value and/or whether a question type structure of a to-be-detected question belongs to a preset question type structure is determined by acquiring a to-be-detected question in a target question library, and then, first detection data of the to-be-detected question is obtained, wherein the to-be-detected question comprises at least one of a question stem, an answer, an option, a character and a formula. And then determining the watermark confidence of the image carried in the topic to be detected, and determining second detection data of the topic to be detected according to the watermark confidence and a preset watermark threshold range of the topic to be detected. And finally, determining the quality score of the to-be-detected question based on the first detection data and the second detection data, and determining the question quality of the to-be-detected question based on the quality score. The quality detection mode of the question bank questions can greatly reduce manpower, and can also carry out frequent quality detection on the question bank with large data magnitude, thereby improving the efficiency and reliability of quality detection.
Further, referring to fig. 12, fig. 12 is a schematic structural diagram of a quality detection device for a question library according to an embodiment of the application. The quality detection device of the question library may be a user terminal, or may be a computer program (including program code) running in the user terminal, for example, the quality detection device of the question library is an application software; the quality detection device of the question library questions can be used for executing corresponding steps in the method provided by the application. As shown in fig. 12, the quality detection device 1 for the question library may include: the first acquisition module 11, the first determination module 12, the second determination module 13, the third determination module 14.
A first obtaining module 11, configured to obtain a topic to be detected in a target topic library;
A first determining module 12, configured to determine whether a target value exists in an object to be detected in the questions to be detected and/or whether a question type structure of the questions to be detected belongs to a preset question type structure, so as to obtain first detection data of the questions to be detected, where the object to be detected includes at least one of a question stem, an answer, an option, a formula and a character;
A second determining module 13, configured to determine a watermark confidence of an image carried in the topic to be detected, and determine second detection data of the topic to be detected according to the watermark confidence and a preset watermark threshold range of the topic to be detected;
a third determining module 14, configured to determine a quality score of the topic to be detected based on the first detection data and the second detection data, and determine a topic quality of the topic to be detected based on the quality score.
In one possible implementation manner, the object to be detected includes a stem, an answer and/or an option, and the target value includes a null value and/or a repetition value; the first determining module 12 includes:
a first determining unit 121, configured to determine a detection rule parameter of an object to be detected in the subject to be detected, where the detection rule parameter includes a length threshold of the object to be detected and/or a number threshold of the object to be detected;
A traversing unit 122, configured to traverse the stem, the answer, and the option in the questions to be detected;
A second determining unit 123, configured to determine that the first detection data is null in the object to be detected if a length of at least one of the stem, the answer, and the option in the object to be detected is less than or equal to the length threshold; and/or
A third determining unit 124, configured to determine that the first detection data is that a duplicate value exists in the at least one object to be detected if the number of the at least one object to be detected is greater than or equal to the number threshold.
In one possible implementation manner, the object to be detected includes a formula, and the target value is a missing value; the first determining module 12 includes:
The detection pairing unit 125 is configured to detect a target symbol of a formula in the subject to be detected, and determine that the first detection data is a missing value in the subject to be detected if the target symbol is not paired; or alternatively
The rendering determining unit 126 is configured to render the formula in the subject to be detected, and if the formula rendering fails, determine that the first detection data is a missing value in the object to be detected.
In one possible implementation manner, the object to be detected includes a character, and the target value is a messy code value; the first determining module 12 includes:
The messy code value determining unit 127 is configured to determine a coding range of the character in the subject to be detected, and if the coding range of the character belongs to the specified coding range, determine that the first detection data is a messy code value stored in the subject to be detected.
In one possible implementation, the first determining module 12 includes:
A feature code determining unit 128 for determining a feature code of a topic type structure of a topic to be detected;
The topic structure determination unit 129 is configured to determine that the topic structure of the topic to be detected is not the preset topic structure if the feature code of the topic structure of the topic to be detected is different from the feature code of the preset topic structure.
In one possible embodiment, the second determining module 13 includes:
A first obtaining unit 131, configured to obtain an image carried in the subject to be detected;
A fourth determining unit 132, configured to determine, if the acquired image carried in the to-be-detected question is empty, a watermark confidence level of the image carried in the to-be-detected question to be 0;
and a fifth determining unit 133, configured to determine an image type of a target image carried in the to-be-detected question if the obtained image carried in the to-be-detected question is not empty, and determine a watermark confidence of the target image according to the image type of the target image, so as to obtain the watermark confidence of the image carried in the to-be-detected question, where the image type of the target image includes a still picture and/or a video frame picture.
In one possible embodiment, the second determining module 13 includes:
A sixth determining unit 134, configured to determine a color layer of the still picture if the target image carried in the subject to be detected is the still picture;
A seventh determining unit 135, configured to determine that the watermark confidence of the image carried in the subject to be detected is 1 if the color layer of the still picture includes a white layer and a black layer, and any color layer exists between the white layer and the black layer;
And an eighth determining unit 136, configured to, if the color layer of the still picture includes a white layer and a black layer, and no color layer exists between the white layer and the black layer, perform edge detection on the target image to obtain a first edge feature of the target image, and determine a watermark confidence of the image carried in the subject to be detected based on the first edge feature of the target image.
In one possible embodiment, the eighth determining unit 136 includes:
a first obtaining subunit 1361, configured to obtain a second edge feature of the watermark template, and determine a first matching degree threshold for determining a watermark confidence degree of the image;
A first determining subunit 1362, configured to determine that the watermark confidence of the image carried in the subject to be detected is 1 if the matching degree of the first edge feature and the second edge feature is greater than or equal to the first matching degree threshold;
The second determining subunit 1363 is configured to determine that the watermark confidence of the image carried in the subject to be detected is 0 if the matching degree of the first edge feature and the second edge feature is less than the first matching degree threshold.
In one possible embodiment, the fifth determining unit 133 includes:
a first obtaining subunit 1331, configured to obtain a second edge feature of the watermark template, and determine a first matching degree threshold for determining a watermark confidence degree of the image;
A third determining subunit 1332, configured to determine that the watermark confidence of the image carried in the subject to be detected is 1 if the matching degree of the first edge feature and the second edge feature is greater than or equal to the first matching degree threshold;
A fourth determining subunit 1333, configured to determine that the watermark confidence of the image carried in the subject to be detected is 0 if the matching degree of the first edge feature and the second edge feature is smaller than the first matching degree threshold.
In one possible embodiment, the fifth determining unit 133 includes:
a fifth determining subunit 1334, configured to determine, if the target image carried in the subject to be detected is a video frame, a target detection region of a video frame in a video to which the target image belongs;
A sixth determining subunit 1335, configured to determine a partial image corresponding to the target detection region from the target image, input the partial image into a target watermark identification model, and output the watermark confidence of the partial image based on the target watermark identification model to obtain the watermark confidence of the target image.
In one possible embodiment, the second determining module 13 includes:
A ninth determining unit 137, configured to determine that the second detection data carries a watermark in the target image if the watermark confidence is greater than or equal to a maximum threshold value of the preset watermark threshold range;
A tenth determining unit 138, configured to determine whether a watermark keyword for detecting whether a watermark exists in the target image if the watermark confidence is greater than or equal to a minimum threshold value of the preset watermark threshold value range and less than a maximum threshold value of the preset watermark threshold value range, and determine the second detection data based on the watermark keyword and a matching degree of text data extracted from the target image;
An eleventh determining unit 139, configured to determine that the second detection data does not carry a watermark in the target image if the watermark confidence is less than a minimum threshold value of the preset watermark threshold range.
In one possible embodiment, the tenth determining unit 138 includes:
A first extraction subunit 1381 configured to extract text data from the target image;
a seventh determining subunit 1382, configured to determine that the second detection data carries a watermark in the target image if the matching degree between the text data and the watermark keyword is greater than or equal to a matching degree threshold;
An eighth determining subunit 1383 is configured to determine that the second detection data does not carry a watermark in the target image if the matching degree between the text data and the watermark keyword is less than the matching degree threshold.
In one possible implementation manner, the third determining module 14 includes:
A twelfth determining unit 141, configured to determine an initial quality score of the topic to be detected;
a thirteenth determining unit 142, configured to determine a first quality score of the to-be-detected question according to whether the target value exists in the to-be-detected object and/or whether the question type structure of the to-be-detected question belongs to a preset question type structure in the first detection data, and the initial quality score;
A fourteenth determining unit 143, configured to determine, if the second detection data includes a watermark in the target image, and the first quality score, a second quality score of the question to be detected;
A fifteenth determining unit 144, configured to determine a quality score of the topic to be detected based on the second quality score.
The specific implementation manner of the first obtaining module 11, the first determining module 12, the second determining module 13, and the third determining module 14 may refer to the description of step S101 to step S105 in the embodiment corresponding to fig. 3, and the description will not be repeated here. In addition, the description of the beneficial effects of the same method is omitted.
Further, referring to fig. 13, fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 13, the computer device 2000 may be applied to a server, which may be the service server 100 in the embodiment corresponding to fig. 1; the computer device 2000 may be applied to a terminal, which may be the user terminal 10a, the user terminals 10b, …, the user terminal 10n in the above-mentioned embodiment corresponding to fig. 1; the computer device 2000 may also be a computer device in the embodiment corresponding to fig. 3. The computer device 2000 may include: processor 2001, network interface 2004 and memory 2005, in addition, the above-described computer device 2000 further includes: a transceiver 2003, and at least one communication bus 2002. Wherein a communication bus 2002 is used to enable connected communications between these components. The network interface 2004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 2005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 2005 may also optionally be at least one storage device located remotely from the aforementioned processor 2001. As shown in fig. 13, an operating system, a network communication module, a user interface module, and a device control application program may be included in the memory 2005 as one type of computer-readable storage medium.
In the computer device 2000 illustrated in fig. 13, the network interface 2004 may provide network communication functions; processor 2001 and transceiver 2003 may be used to invoke device control applications stored in memory 2005 to implement:
the transceiver 2003 is configured to obtain a subject to be detected in the target subject library;
The processor 2001 is configured to determine whether a target value exists in an object to be detected in the questions to be detected and/or whether a question type structure of the questions to be detected belongs to a preset question type structure so as to obtain first detection data of the questions to be detected, where the object to be detected includes at least one of a question stem, an answer, an option, a formula and a character;
The processor 2001 is further configured to determine a watermark confidence of an image carried in the subject to be detected, and determine second detection data of the subject to be detected according to the watermark confidence and a preset watermark threshold range of the subject to be detected;
the processor 2001 is further configured to determine a quality score of the topic to be detected based on the first detection data and the second detection data, and determine a topic quality of the topic to be detected based on the quality score.
In one possible implementation, the object to be detected includes a stem, an answer, and/or an option, and the target value includes a null value and/or a repeat value; the processor 2001 is further configured to:
determining detection rule parameters of objects to be detected in the questions to be detected, wherein the detection rule parameters comprise a length threshold of the objects to be detected and/or a quantity threshold of the objects to be detected;
Traversing the stems, answers and options in the questions to be detected;
If the length of at least one object to be detected in the question to be detected, the question stem, the answer and the options is smaller than or equal to the length threshold value, determining that the first detection data is null value in the object to be detected; and/or
And if the number of the at least one object to be detected is greater than or equal to the number threshold, determining that the first detection data is that the repeated value exists in the object to be detected.
In one possible implementation, the object to be detected includes a formula, and the target value is a missing value; the processor 2001 is further configured to:
Detecting target symbols of formulas in the questions to be detected, and if the target symbols are not matched, determining that the first detection data are missing values in the objects to be detected; or alternatively
Rendering the formula in the subject to be detected, and if the formula fails to be rendered, determining that the first detection data is a missing value in the object to be detected.
In one possible implementation, the object to be detected includes a character, and the target value is a disorder value; the processor 2001 is further configured to:
and determining the coding range of the characters in the title to be detected, and if the coding range of the characters belongs to the designated coding range, determining the first detection data as a messy code value stored in the object to be detected.
In one possible implementation, the processor 2001 is further configured to:
determining the feature codes of the topic type structure of the topic to be detected;
if the feature code of the question type structure of the question to be detected is different from the feature code of the preset question type structure, determining that the first detection data is that the question type structure of the question to be detected does not belong to the preset question type structure.
In one possible implementation, the processor 2001 is further configured to:
If the image carried in the question to be detected is obtained by the transceiver 2003 to be empty, determining the watermark confidence of the image carried in the question to be detected to be 0;
If the image carried in the subject to be detected is not empty, the transceiver 2003 determines the image type of the target image carried in the subject to be detected, and determines the watermark confidence of the target image according to the image type of the target image, so as to obtain the watermark confidence of the image carried in the subject to be detected, wherein the image type of the target image comprises a still picture and/or a video frame picture.
In one possible implementation, the processor 2001 is further configured to:
If the target image carried in the subject to be detected is a static image, determining a color layer of the static image;
if the color layer of the static picture comprises a white layer and a black layer, and any color layer exists between the white layer and the black layer, determining the watermark confidence level 1 of the image carried in the subject to be detected;
if the color layer of the static picture comprises a white layer and a black layer, and no color layer exists between the white layer and the black layer, performing edge detection on the target image to obtain a first edge characteristic of the target image, and determining the watermark confidence of the image carried in the subject to be detected based on the first edge characteristic of the target image.
In one possible implementation, the processor 2001 is further configured to:
acquiring a second edge feature of the watermark template through the transceiver 2003, and determining a first matching degree threshold for determining watermark confidence of the image;
if the matching degree of the first edge feature and the second edge feature is greater than or equal to the first matching degree threshold, determining that the watermark confidence degree of the image carried in the subject to be detected is 1;
if the matching degree of the first edge feature and the second edge feature is smaller than the first matching degree threshold, determining that the watermark confidence degree of the image carried in the subject to be detected is 0.
In one possible implementation, the processor 2001 is further configured to:
If the target image carried in the subject to be detected is a video frame picture, determining a target detection area of the video frame picture in the video to which the target image belongs;
And determining a partial image corresponding to the target detection area from the target image, inputting the partial image into a target watermark identification model, and outputting watermark confidence of the partial image based on the target watermark identification model to obtain watermark confidence of the target image.
In one possible implementation, the processor 2001 is further configured to:
If the watermark confidence is greater than or equal to the maximum threshold value of the preset watermark threshold value range, determining that the second detection data carries the watermark in the target image;
If the watermark confidence is greater than or equal to the minimum threshold value of the preset watermark threshold value range and is smaller than the maximum threshold value of the preset watermark threshold value range, determining whether a watermark keyword for detecting whether the target image has a watermark exists or not, and determining the second detection data based on the watermark keyword and the matching degree of text data extracted from the target image;
And if the watermark confidence is smaller than the minimum threshold value of the preset watermark threshold value range, determining that the second detection data does not carry the watermark in the target image.
In one possible implementation, the processor 2001 is further configured to:
extracting text data from the target image;
If the matching degree of the text data and the watermark key words is greater than or equal to a matching degree threshold value, determining that the second detection data carries the watermark in the target image;
And if the matching degree of the text data and the watermark key words is smaller than the matching degree threshold value, determining that the second detection data does not carry the watermark in the target image.
In one possible implementation, the processor 2001 is further configured to:
Determining an initial quality score of the title to be detected;
Determining a first quality score of the to-be-detected problem according to whether a target value exists in the to-be-detected object in the first detection data and/or whether a problem type structure of the to-be-detected problem belongs to a preset problem type structure and the initial quality score;
and determining a second quality score of the questions to be detected according to whether the target image carries the watermark in the second detection data and the first quality score, and determining the quality score of the questions to be detected based on the second quality score.
It should be understood that the computer device 2000 described in the embodiments of the present application may perform the description of the embodiments corresponding to fig. 3 and/or fig. 10, and may also perform the description of the quality detection device for the questions in the embodiments corresponding to fig. 12, which is not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.
Furthermore, it should be noted here that: the present application also provides a computer readable storage medium, in which a computer program executed by a quality detection device for a question bank question mentioned above is stored, and the computer program includes program instructions, when executed by the processor, can execute the description of the quality detection method for the question bank question in the embodiment corresponding to fig. 3 and/or fig. 10, and therefore, the description will not be repeated here. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application. As an example, the program instructions may be deployed to be executed on one computing device or on multiple computing devices at one site.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of computer programs, which may be stored on a computer-readable storage medium, and which, when executed, may comprise the steps of the embodiments of the methods described above. The computer readable storage medium may be a quality detection device for question library questions provided in any of the foregoing embodiments or an internal storage unit of the apparatus, for example, a hard disk or a memory of an electronic apparatus. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided on the electronic device. The computer readable storage medium may also include a magnetic disk, an optical disk, a read-only memory (ROM), a random access memory, or the like. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used to store the computer program and other programs and quantities required by the electronic device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
The terms first, second and the like in the claims and in the description and drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments. The term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The functional units in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (7)

1. A method for detecting the quality of a question bank question, the method comprising:
Acquiring a subject to be detected in a target subject library;
determining whether a target value exists in an object to be detected in the questions to be detected and whether a question type structure of the questions to be detected belongs to a preset question type structure so as to obtain first detection data of the questions to be detected, wherein the object to be detected comprises at least one of a question stem, an answer, an option, a formula and a character;
determining watermark confidence of an image carried in the topic to be detected, and determining second detection data of the topic to be detected according to the watermark confidence and a preset watermark threshold range of the topic to be detected;
Determining a quality score of the topic to be detected based on the first detection data and the second detection data, and determining a topic quality of the topic to be detected based on the quality score;
the determining whether the question type structure of the question to be detected belongs to a preset question type structure to obtain first detection data of the question to be detected includes:
determining the feature codes of the topic type structures of the topics to be detected;
if the feature codes of the topic structures of the topics to be detected are different from the feature codes of the preset topic structures, determining that the first detection data is that the topic structures of the topics to be detected do not belong to the preset topic structures;
the determining the second detection data of the questions to be detected according to the watermark confidence and the preset watermark threshold range of the questions to be detected includes:
If the watermark confidence is greater than or equal to the maximum threshold value of the preset watermark threshold value range, determining that the second detection data carries watermark in the target image;
If the watermark confidence is greater than or equal to the minimum threshold value of the preset watermark threshold value range and is smaller than the maximum threshold value of the preset watermark threshold value range, determining whether a watermark keyword used for detecting whether the target image has a watermark exists or not, and determining the second detection data based on the watermark keyword and the matching degree of text data extracted from the target image;
if the watermark confidence is smaller than the minimum threshold value of the preset watermark threshold value range, determining that the second detection data does not carry watermark in the target image;
Wherein the determining the quality score of the topic to be detected based on the first detection data and the second detection data comprises:
Determining an initial quality score for the topic to be detected;
Determining a first quality score of the questions to be detected according to whether target values exist in the objects to be detected in the first detection data and/or whether the question type structures of the questions to be detected belong to preset question type structures and the initial quality score;
And determining a second quality score of the questions to be detected according to whether the target image in the second detection data carries the watermark or not and the first quality score, and determining the quality score of the questions to be detected based on the second quality score.
2. The method according to claim 1, wherein the object to be detected comprises a stem, an answer and/or an option, and the target value comprises a null value and/or a repeat value; the determining whether the target value exists in the object to be detected in the questions to be detected to obtain the first detection data of the questions to be detected includes:
determining detection rule parameters of objects to be detected in the questions to be detected, wherein the detection rule parameters comprise a length threshold of the objects to be detected and/or a quantity threshold of the objects to be detected;
Traversing the stems, answers and options in the questions to be detected;
If the length of at least one object to be detected in the question to be detected, the question stem, the answer and the options is smaller than or equal to the length threshold value, determining that the first detection data is null value in the object to be detected; and/or
And if the number of the at least one object to be detected is greater than or equal to the number threshold, determining that the first detection data is that a repeated value exists in the object to be detected.
3. The method according to claim 1, wherein the object to be detected comprises a formula, and the target value is a missing value; the determining whether the target value exists in the object to be detected in the questions to be detected to obtain the first detection data of the questions to be detected includes:
Detecting target symbols of formulas in the questions to be detected, and if the target symbols are not matched, determining that the first detection data are missing values in the objects to be detected; or alternatively
And rendering the formula in the subject to be detected, and if the formula fails to be rendered, determining that the first detection data is a missing value in the subject to be detected.
4. The method according to claim 1, wherein the object to be detected includes a character, and the target value is a scrambling code value; the determining whether the target value exists in the object to be detected in the questions to be detected to obtain the first detection data of the questions to be detected includes:
and determining the coding range of the characters in the title to be detected, and if the coding range of the characters belongs to a designated coding range, determining the first detection data as a messy code value stored in the object to be detected.
5. The method according to any one of claims 1-4, wherein determining the watermark confidence of the image carried in the topic to be detected comprises:
acquiring an image carried in the subject to be detected;
if the acquired images carried in the questions to be detected are empty, determining the watermark confidence of the images carried in the questions to be detected as 0;
if the image carried in the subject to be detected is not empty, determining the image type of the target image carried in the subject to be detected, and determining the watermark confidence of the target image according to the image type of the target image to obtain the watermark confidence of the image carried in the subject to be detected, wherein the image type of the target image comprises a static picture and/or a video frame picture.
6. A computer device, comprising: a processor, transceiver, memory, and network interface;
The processor is connected to a memory for providing data communication functions, a transceiver for storing program code, and a network interface for invoking the program code to perform the method of any of claims 1-5.
7. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method of any of claims 1-5.
CN202110663603.XA 2021-06-15 2021-06-15 Method, device and storage medium for detecting quality of question library questions Active CN113822521B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110663603.XA CN113822521B (en) 2021-06-15 2021-06-15 Method, device and storage medium for detecting quality of question library questions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110663603.XA CN113822521B (en) 2021-06-15 2021-06-15 Method, device and storage medium for detecting quality of question library questions

Publications (2)

Publication Number Publication Date
CN113822521A CN113822521A (en) 2021-12-21
CN113822521B true CN113822521B (en) 2024-05-24

Family

ID=78912567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110663603.XA Active CN113822521B (en) 2021-06-15 2021-06-15 Method, device and storage medium for detecting quality of question library questions

Country Status (1)

Country Link
CN (1) CN113822521B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116662305B (en) * 2023-06-06 2024-07-30 森纵艾数(北京)科技有限公司 Question bank management method, system, electronic equipment and storage medium
CN118428360B (en) * 2024-07-05 2024-09-06 卓世智星(青田)元宇宙科技有限公司 Question quality detection method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007023993A1 (en) * 2005-08-23 2007-03-01 Ricoh Company, Ltd. Data organization and access for mixed media document system
JP2007221511A (en) * 2006-02-17 2007-08-30 Nobuhiko Ido Receiver with function of obtaining reproduced sound, voice reproducing device with function of recording obtained reproduced voice, and voice signal processor for analyzing contents of recorded sound
CN109491990A (en) * 2018-09-17 2019-03-19 武汉达梦数据库有限公司 A kind of method of detection data quality and the device of detection data quality
CN109542886A (en) * 2018-11-23 2019-03-29 山东浪潮云信息技术有限公司 A kind of data quality checking method of Government data
CN111427974A (en) * 2020-06-11 2020-07-17 杭州城市大数据运营有限公司 Data quality evaluation management method and device
CN111737446A (en) * 2020-06-22 2020-10-02 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for constructing quality evaluation model
CN111798360A (en) * 2020-06-30 2020-10-20 百度在线网络技术(北京)有限公司 Watermark detection method, watermark detection device, electronic equipment and storage medium
CN111951148A (en) * 2020-07-13 2020-11-17 清华大学 PDF document watermark generation method and watermark extraction method
CN112417088A (en) * 2019-08-19 2021-02-26 武汉渔见晚科技有限责任公司 Evaluation method and device for text value in community

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7249257B2 (en) * 2001-03-05 2007-07-24 Digimarc Corporation Digitally watermarked maps and signs and related navigational tools
US9171202B2 (en) * 2005-08-23 2015-10-27 Ricoh Co., Ltd. Data organization and access for mixed media document system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007023993A1 (en) * 2005-08-23 2007-03-01 Ricoh Company, Ltd. Data organization and access for mixed media document system
JP2007221511A (en) * 2006-02-17 2007-08-30 Nobuhiko Ido Receiver with function of obtaining reproduced sound, voice reproducing device with function of recording obtained reproduced voice, and voice signal processor for analyzing contents of recorded sound
CN109491990A (en) * 2018-09-17 2019-03-19 武汉达梦数据库有限公司 A kind of method of detection data quality and the device of detection data quality
CN109542886A (en) * 2018-11-23 2019-03-29 山东浪潮云信息技术有限公司 A kind of data quality checking method of Government data
CN112417088A (en) * 2019-08-19 2021-02-26 武汉渔见晚科技有限责任公司 Evaluation method and device for text value in community
CN111427974A (en) * 2020-06-11 2020-07-17 杭州城市大数据运营有限公司 Data quality evaluation management method and device
CN111737446A (en) * 2020-06-22 2020-10-02 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for constructing quality evaluation model
CN111798360A (en) * 2020-06-30 2020-10-20 百度在线网络技术(北京)有限公司 Watermark detection method, watermark detection device, electronic equipment and storage medium
CN111951148A (en) * 2020-07-13 2020-11-17 清华大学 PDF document watermark generation method and watermark extraction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
国内中文期刊发表的中医药系统综述和Meta分析文献质量再评价;李青;夏芸;牟钰洁;王禹毅;刘建平;;北京中医药大学学报(中医临床版);20120530(第03期);第28-33页 *

Also Published As

Publication number Publication date
CN113822521A (en) 2021-12-21

Similar Documents

Publication Publication Date Title
CN107766371B (en) Text information classification method and device
US20200250226A1 (en) Similar face retrieval method, device and storage medium
US11899681B2 (en) Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
CN109493265A (en) A kind of Policy Interpretation method and Policy Interpretation system based on deep learning
CN109543690A (en) Method and apparatus for extracting information
Xu et al. Remote sensing image scene classification based on generative adversarial networks
US10776885B2 (en) Mutually reinforcing ranking of social media accounts and contents
CN109408821B (en) Corpus generation method and device, computing equipment and storage medium
CN113822521B (en) Method, device and storage medium for detecting quality of question library questions
CN112052414A (en) Data processing method and device and readable storage medium
CN116955707A (en) Content tag determination method, device, equipment, medium and program product
CN114596566A (en) Text recognition method and related device
CN114201516B (en) User portrait construction method, information recommendation method and related devices
CN112801099B (en) Image processing method, device, terminal equipment and medium
CN109446461A (en) A kind of method of CDN and CACHE caching flame content auditing
CN104881428A (en) Information graph extracting and retrieving method and device for information graph webpages
CN111881900B (en) Corpus generation method, corpus translation model training method, corpus translation model translation method, corpus translation device, corpus translation equipment and corpus translation medium
CN117709317A (en) Report file processing method and device and electronic equipment
CN113407696A (en) Collection table processing method, device, equipment and storage medium
CN116680422A (en) Multi-mode question bank resource duplicate checking method, system, device and storage medium
CN117009577A (en) Video data processing method, device, equipment and readable storage medium
Xu et al. Estimating similarity of rich internet pages using visual information
CN115168609A (en) Text matching method and device, computer equipment and storage medium
CN114579876A (en) False information detection method, device, equipment and medium
CN113705209A (en) Subtitle generating method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant