CN111612658B - Evaluation method and evaluation device for legal data retrieval and electronic equipment - Google Patents

Evaluation method and evaluation device for legal data retrieval and electronic equipment Download PDF

Info

Publication number
CN111612658B
CN111612658B CN202010476485.7A CN202010476485A CN111612658B CN 111612658 B CN111612658 B CN 111612658B CN 202010476485 A CN202010476485 A CN 202010476485A CN 111612658 B CN111612658 B CN 111612658B
Authority
CN
China
Prior art keywords
retrieval
data
evaluation
user
legal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010476485.7A
Other languages
Chinese (zh)
Other versions
CN111612658A (en
Inventor
李东海
郭晓妮
张斌琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huayu Yuandian Information Services Co ltd
Original Assignee
Beijing Huayu Yuandian Information Services Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huayu Yuandian Information Services Co ltd filed Critical Beijing Huayu Yuandian Information Services Co ltd
Priority to CN202010476485.7A priority Critical patent/CN111612658B/en
Publication of CN111612658A publication Critical patent/CN111612658A/en
Application granted granted Critical
Publication of CN111612658B publication Critical patent/CN111612658B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • G06F16/337Profile generation, learning or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Health & Medical Sciences (AREA)
  • Technology Law (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An evaluation method and an evaluation device for legal data retrieval and an electronic device are disclosed. The evaluation method comprises the following steps: based on the information quantity, the information quality and the reliability of retrieval data of a user using legal data retrieval, performing retrieval convenience evaluation on the legal data retrieval; based on the correlation between the retrieval result in the retrieval data and the corresponding retrieval query subtopic, performing diversified retrieval evaluation; generating a retrieval demand of a corresponding user based on the user data of the retrieval data, and judging whether the retrieval result of the retrieval data matches the retrieval demand so as to evaluate the user satisfaction; and classifying the legal data in the retrieval data based on a clustering algorithm, respectively training a click rate prediction model according to the legal data of each category, and jointly predicting the click rate by using a plurality of trained click rate prediction models so as to perform click rate prediction evaluation. In this way, the legal data retrieval is objectively evaluated.

Description

Evaluation method and evaluation device for legal data retrieval and electronic equipment
Technical Field
The present application relates to legal data retrieval, and more particularly, to an evaluation method, an evaluation apparatus, and an electronic device for legal data retrieval.
Background
Legal data (such as judicial public information, official documents and the like) has large data volume, strong timeliness, and convenient, effective and real-time legal data retrieval function, and is one of important factors influencing the usability of legal data retrieval (engines, applications or services). Therefore, there is a need for targeted evaluation of legal data retrieval to provide an objective index of whether the legal data retrieval meets the user's needs.
Meanwhile, the legal data retrieval task is complex, and the legal data retrieval task has the characteristics of wide user retrieval requirement range and difficulty in grasping, so that an objective evaluation scheme aiming at the legal data retrieval is needed.
Disclosure of Invention
The present application is proposed to solve the above-mentioned technical problems. The embodiment of the application provides an evaluation method, an evaluation device and an electronic device for legal data retrieval, which are suitable for carrying out targeted evaluation on the legal data retrieval so as to provide objective indexes on whether the legal data retrieval meets the requirements of users.
According to an aspect of the present application, there is provided an evaluation method of legal data retrieval, including:
acquiring retrieval data of a plurality of user retrieval using legal data;
based on the information quantity, the information quality and the reliability of the retrieval data, carrying out retrieval convenience evaluation on the legal data retrieval;
performing diversified retrieval evaluation on the legal data retrieval based on the correlation between retrieval results in the retrieval data and the subtopics of the retrieval query corresponding to the retrieval results;
generating a retrieval requirement of a corresponding user based on the user data of the retrieval data, and judging whether the retrieval result of the retrieval data matches the retrieval requirement so as to evaluate the user satisfaction degree of the legal data retrieval; and
classifying legal data in the retrieval data based on a clustering algorithm; respectively training a click rate prediction model according to each type of legal data; and jointly predicting the click rate by using the trained multiple click rate prediction models so as to perform click rate prediction evaluation on the legal data retrieval.
In the above evaluation method for legal data retrieval, the evaluation of retrieval convenience for the legal data retrieval based on the information quantity, information quality and reliability of the retrieved data includes: acquiring a retrieval result of part of the retrieval data; and acquiring the information quantity, the information quality and the reliability of the retrieval data based on the recall ratio and the precision ratio of part of the retrieval results.
In the evaluation method for legal data retrieval, based on the correlation between the retrieval result in the retrieval data and the sub-topic of the retrieval query corresponding to the retrieval result, the method for performing diversified retrieval evaluation on the legal data retrieval comprises the following steps: obtaining at least one of the following evaluation indexes based on the correlation between the retrieval result in the retrieval data and the sub-topic of the retrieval query corresponding to the retrieval result: P-IA index, ERP-IA @ K index, alpha-nDCG index, NRBP index, and rank correlation measure.
In the evaluation method for legal data retrieval, a retrieval requirement of a corresponding user is generated based on user data of the retrieval data, and whether a retrieval result of the retrieval data matches the retrieval requirement is judged to evaluate user satisfaction for the legal data retrieval is determined, including: and generating the retrieval requirement of the user based on the identity, the region and the interest in the user data.
In the evaluation method for legal data retrieval, an evaluation index system for evaluating the user satisfaction of the legal data retrieval comprises four primary indexes of usability, interactivity, information quality and system quality, and the feasibility comprises four secondary indexes of easy browsing, easy comprehensibility, easy usability and interface design; the interactivity comprises two secondary indexes of human-computer interaction and user interaction; the information quality comprises four secondary indexes of accuracy, completeness, authority and completeness; the system quality comprises four secondary indexes of high concurrent access, stability, safety and responsiveness.
In the evaluation method for legal data retrieval, a click-through rate prediction model is trained according to each type of legal data, and the evaluation method comprises the following steps: and mining the nonlinear relation between the features by using a gradient lifting decision tree model.
In the above evaluation method for legal data retrieval, acquiring retrieval data of a plurality of users in the legal data retrieval includes: identifying an anomalous user of the plurality of users; and deleting the retrieval data of the abnormal user.
In the evaluation method for legal data retrieval, identifying an abnormal user among the plurality of users includes: recognizing abnormal users in the plurality of users by using the trained abnormal user recognition model, wherein the training process of the abnormal user recognition model comprises the following steps: preprocessing the acquired flow data of the user in the legal data retrieval to obtain a partially labeled training sample; processing the partially labeled training samples using a majority-class distribution sample processing method to generate a plurality of training sample subsets; generating a plurality of member classifiers using a mixture perturbation technique; training the member classifier respectively by using a training sample subset; and selecting at least one part of trained member classifiers and constructing an integrated classifier, wherein the integrated classifier is an abnormal user identification model.
According to another aspect of the present application, there is also provided an evaluation device for legal data retrieval, including:
a retrieval data acquisition unit for acquiring retrieval data for retrieval of legal data used by a plurality of users;
the first evaluation unit is used for evaluating the retrieval convenience of the legal data retrieval based on the information quantity, the information quality and the reliability of the retrieved data;
the second evaluation unit is used for performing diversified retrieval evaluation on the legal data retrieval based on the correlation between the retrieval result in the retrieval data and the subtopic of the retrieval query corresponding to the retrieval result;
the third evaluation unit is used for generating a retrieval demand of a corresponding user based on the user data of the retrieval data, and judging whether the retrieval result of the retrieval data matches the retrieval demand so as to evaluate the user satisfaction degree of the legal data retrieval; and
the fourth evaluation unit is used for classifying legal data in the retrieval data based on a clustering algorithm; respectively training a click rate prediction model according to each type of legal data; and jointly predicting the click rate by using the trained multiple click rate prediction models so as to perform click rate prediction evaluation on the legal data retrieval.
In the evaluation device for legal data retrieval, the retrieval data acquiring unit is further configured to: identifying an anomalous user of the plurality of users; and deleting the retrieval data of the abnormal user.
According to yet another aspect of the present application, there is provided an electronic device including: a processor; and a memory in which are stored computer program instructions which, when executed by the processor, cause the processor to perform the evaluation method of legal data retrieval as described above.
According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the evaluation method of legal data retrieval as described above.
The evaluation method, the evaluation device and the electronic equipment for legal data retrieval according to the embodiment of the application are suitable for carrying out targeted evaluation on the legal data retrieval so as to provide objective indexes on whether the legal data retrieval meets the requirements of users.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 illustrates a flow chart of an evaluation method of legal data retrieval according to an embodiment of the application.
Fig. 2 is a schematic diagram illustrating an evaluation index system of user satisfaction according to an embodiment of the present application.
FIG. 3 illustrates a schematic diagram of a gradient boosting decision tree model according to an embodiment of the present application.
FIG. 4 illustrates a flow diagram of abnormal user identification according to an embodiment of the application.
FIG. 5 illustrates a flow diagram of a training process for an abnormal user recognition model according to an embodiment of the present application.
Fig. 6 illustrates a block diagram of an evaluation device for legal data retrieval according to an embodiment of the present application.
FIG. 7 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.
Exemplary method
Fig. 1 illustrates a flow chart of an evaluation method of legal data retrieval according to an embodiment of the application. As shown in fig. 1, the evaluation method for legal data retrieval according to the embodiment of the present application includes: s110, acquiring retrieval data of a plurality of users for retrieval by using legal data; s120, based on the information quantity, the information quality and the reliability of the retrieval data, carrying out retrieval convenience evaluation on the legal data retrieval; s130, based on the correlation between the retrieval result in the retrieval data and the sub-topic of the retrieval query corresponding to the retrieval result, carrying out diversified retrieval evaluation on the legal data retrieval; s140, generating a retrieval requirement of a corresponding user based on the user data of the retrieval data, and judging whether the retrieval result of the retrieval data matches the retrieval requirement so as to evaluate the user satisfaction degree of the legal data retrieval; s150, classifying legal data in the retrieval data based on a clustering algorithm; respectively training a click rate prediction model according to each type of legal data; and jointly predicting the click rate by using the trained multiple click rate prediction models so as to perform click rate prediction evaluation on the legal data retrieval.
That is, according to the evaluation method of legal data retrieval of the embodiment of the present application, by analyzing retrieval data retrieved by a user using the legal data, retrieval convenience evaluation, diversified retrieval evaluation, user satisfaction evaluation, and click rate prediction evaluation are performed on the legal data retrieval, and evaluation indexes adopted in the retrieval convenience evaluation, diversified retrieval evaluation, user satisfaction evaluation, and click rate prediction evaluation are objective indexes (indexes obtained by calculation, not artificially defined indexes), in such a manner that objective reference criteria are provided for whether the legal data retrieval satisfies user needs.
More specifically, in step S110, retrieval data retrieved by a plurality of users using legal data is acquired. Here, the retrieval data of the user includes data related to a user's act of retrieval using the legal data, a retrieval result, user data, and the like. Moreover, the legal data retrieval performed by the user includes any entity having a legal data retrieval function, such as a legal data retrieval engine, a packaged legal data retrieval service platform, an application having a legal data retrieval function, and the like, which is not limited in this application.
In step S120, the retrieval convenience evaluation is performed on the legal data retrieval based on the information quantity, information quality, and reliability of the retrieved data. In particular, in the embodiment of the present application, the evaluation method evaluates the retrieval convenience of the legal data retrieval based on statistical principles and statistical methods.
Specifically, the process of evaluating the retrieval convenience of the legal data retrieval by using a statistical analysis method includes: first, an evaluation target, i.e., retrieval convenience is determined; then, determining evaluation items related to the evaluation target, namely determining analysis and evaluation items and statistical items needing to be collected according to the requirement of evaluation and retrieval convenience, and making a survey form; then, collecting retrieval data retrieved by different users using the legal data, preferably, the type distribution of the users should be as wide and uniform as possible, for example, selecting the parties, the out-of-case persons, the lawyers and the judges of a preset sample size as a sample set; next, selecting key data in the search data: searching query and searching result matched with the searching query, and performing statistical analysis on the searching result; then, calculating a statistical result, and calculating a recall ratio and an accuracy ratio according to the statistical result; then, the analysis results are summarized to conclude that: and analyzing the statistical result, and further analyzing the information quantity, the information quality and the reliability of the legal data retrieval so as to evaluate the retrieval convenience of the legal data retrieval.
It should be noted that, during statistics, it is impossible to perform statistical analysis on all the search data, and in the embodiment of the present application, a sampling survey manner is adopted to select a part of search queries, and statistical analysis is performed on search results corresponding to the part of search queries. That is, in the embodiment of the present application, the evaluation of the retrieval convenience for the legal data retrieval based on the information quantity, the information quality, and the reliability of the retrieved data includes: acquiring a retrieval result of part of the retrieval data; and acquiring the information quantity, the information quality and the reliability of the retrieval data based on the recall ratio and the precision ratio of part of the retrieval results.
In step S130, based on the correlation between the retrieval result in the retrieval data and the sub-topic of the retrieval query corresponding to the retrieval result, the legal data retrieval is subjected to diversified retrieval evaluation. It should be understood that in legal data retrieval, the retrieval requirements of users are often difficult to grasp, so that legal data retrieval not only needs to ensure that the returned retrieval results are highly relevant, but also needs to ensure that the retrieval requirements of different types of users can be met. In fact, most of the existing legal data retrieval platforms cannot accurately reflect the query intention of the user even if technical means such as fuzzy query and multi-aspect query are adopted. Accordingly, to better address these queries, some legal data retrieval services employ a strategy of result diversification: the likelihood of a user finding information tailored to his or her needs is increased by providing search results (e.g., legal documents) that cover multiple aspects.
The implementation of a result-diversity evaluation strategy generally requires two steps: first, for a given search query, the search engine obtains an initial list of documents based on a relevance ranking function (e.g., a Page-rank ranking function); then, a diversification policy is applied to adjust the order of the initial list to meet the diversification requirements of the user. Here, the second step is emphasized, and the ideal rearrangement effect should satisfy the conditions of high correlation, wide coverage and low similarity to the maximum extent. However, due to the high complexity of the diversification problem, most of the adjustment strategies are based on greedy algorithms, that is, local optimal legal data is continuously selected from initial search results, and a rearrangement result is generated after iteration is performed for multiple times. Then, the existing evaluation indexes do not support result diversification.
In view of the deficiency that the existing evaluation indexes do not support result diversification, the evaluation method according to the embodiment of the present application puts emphasis on the correlation between the retrieval result in the retrieval data and the sub-topic of the retrieval query corresponding to the retrieval result, and proposes some evaluation indexes, which include but are not limited to: P-IA index, ERP-IA @ K index, alpha-nDCG index, NRBP index, and rank correlation measure.
Specifically, the calculation process of the P-IA (Intent-Aware Precision) index comprises the following steps: suppose that given a search query q, it contains n sub-topics (q1, q2, …, qn). If the document at the jth position of the retrieval return result is related to the ith sub-topic of the query q, ri, j equals 1, otherwise, ri, j equals 0. The P-IA (Intent-Aware Precision) index with a truncation factor of k is defined as follows:
Figure BDA0002516035290000061
the ERR-IA @ k (Intent-Aware probabilistic Rank) index is an evaluation index based on a sub-user model, and the judgment of a document is not limited to binary (relevant/irrelevant) any more, but multi-valued judgment information can be generated according to the specific relevance degree of the document.
The α -nDCG (normalized partitioned relational gain) index evaluates multiple intentions of a query expressed as information blocks (information numbers).
The NRBP index is similar to the α -nDCG index, but additionally takes into account a Rank-Biased Precision (Rank-Biased Precision) factor in the evaluation, which is derived from a simple model of user behavior. Like the α -nDCG index, the NRBP index penalizes the utility of the redundant chunks covered by the later ranked documents.
A rank correlation measure (correlation) is used to compare whether two sets of ranked lists are similar to see how much the difference between the ranking functions is. The common correlation coefficients include Pearson product difference correlation coefficient, Kendall rank correlation coefficient, Spearman rank correlation coefficient and the like, and the value ranges of the correlation coefficients are [ -1,1 ]. If the two sets are ordered identically, the correlation coefficient is 1, and if the two sets are ordered identically, the correlation coefficient is-1. If there is no correlation, the correlation coefficient is close to 0. The closer the correlation coefficient is to 1, the stronger the correlation between the two sets of ranking is, and the smaller the difference between the corresponding ranking functions is.
That is to say, in the embodiment of the present application, based on the correlation between the retrieval result in the retrieval data and the sub-topic of the retrieval query corresponding to the retrieval result, performing diversified retrieval evaluation on the legal data retrieval includes: obtaining at least one of the following evaluation indexes based on the correlation between the retrieval result in the retrieval data and the sub-topic of the retrieval query corresponding to the retrieval result: P-IA index, ERP-IA @ K index, alpha-nDCG index, NRBP index, and rank correlation measure.
In step S140, a retrieval requirement of a corresponding user is generated based on the user data of the retrieval data, and it is determined whether a retrieval result of the retrieval data matches the retrieval requirement, so as to perform user satisfaction evaluation on the legal data retrieval.
Specifically, in the embodiment of the application, the retrieval requirement of the user is generated based on the information such as the identity, the region and the interest in the user data; and then, judging whether the retrieval result of the retrieval data matches the retrieval requirement or not based on an evaluation index system for evaluating the user satisfaction so as to evaluate the user satisfaction for the legal data retrieval.
In particular, in the embodiment of the present application, an evaluation index system for evaluating user satisfaction, which includes 4 primary indexes and 14 secondary indexes, is constructed in combination of webqual4.0 (availability, information quality, and interaction quality) and a D & M system success model (information quality, system quality, and service quality), and the weight of each evaluation index is determined using an Analytic Hierarchy Process (AHP).
Fig. 2 is a schematic diagram illustrating an evaluation index system of user satisfaction according to an embodiment of the present application. As shown in fig. 2, the evaluation index system is divided into three layers, namely a target layer, a criterion layer and a dimension layer, wherein the target layer is the user satisfaction; the criterion layer is a first-level index of a user satisfaction evaluation system and mainly comprises usability, interaction quality, information quality and system quality; the dimension layer is a secondary index of a user satisfaction evaluation system and mainly comprises 14 evaluation indexes such as easy browsing, easy learning and usability.
Usability reflects primarily the experience of use of legal data retrieval products or systems. In combination with the features of legal data retrieval, the usability is embodied as follows: (1) the method has the advantages of easy browsing, namely, the legal data retrieval navigation system is clear and clear without confusion, and the page is easy to browse; (2) the comprehensibility, namely the operation steps of the legal data retrieval function are easy to learn and use by users; (3) the usability is easy, namely the function operation is simple and the use is easy; (4) the interface design, namely the style of the interface is uniform, the function layout is coordinated, and the color is beautiful.
The quality of the interaction quality directly affects the satisfaction degree of the user. In the method, the interaction quality is divided into 2 dimensions: human-computer interaction and user interaction. (1) And man-machine interaction, namely, a user can perform good interactive operation with the legal data retrieval entity. Whether the user needs to be judged according to the identity, the region and the interest of the user through the generated user portrait or not, whether the user search intention can be accurately judged after the user inputs the keywords or not, and the content desired by the user can be retrieved. For example: the user inputs divorce to search, and the legal data search entity can recommend the information of the case related to the divorce dispute according to the requirements of the legal data search entity. (2) Inter-user interaction, i.e., legal data retrieval entities, can provide a platform or tool for user interaction.
The information quality reflects the content quality of legal data retrieval services. The specific evaluation indexes are as follows: (1) the accuracy, namely the judicial information on the legal data retrieval entity is accurate, clear, delicate and unambiguous, and the condition that the user is misled due to the wrong information can be avoided; (2) the completeness, namely the legal data retrieval entity can open and inquire the published judicial information, and can meet the requirements of users on rich and high-quality information resources; (3) the timeliness, namely the judicial information updating speed is high, and the requirements of users on the latest information and knowledge can be met; (4) authority, namely judicial information resource source is reliable, and reliability is high; (5) the completeness, namely judicial information classification is scientific, the system is complete, and main nodes before examination, during examination and after examination are fully covered.
The system quality is an important factor influencing the satisfaction degree of a user, and specifically comprises the following evaluation indexes: (1) safety: the law indicating data retrieval entity keeps the user information secret and is safe in transaction; (2) stability: the system is stable and can log in at any time; (3) responsiveness: the system processing and reaction speed is high, and the user experience is high; (4) high concurrent access: the platform can support simultaneous online access of a large number of users.
In step S150, classifying legal data in the search data based on a clustering algorithm; respectively training a click rate prediction model according to each type of legal data; and jointly predicting the click rate by using the trained multiple click rate prediction models so as to perform click rate prediction evaluation on the legal data retrieval. The click rate is predicted according to the information in the retrieval data, wherein certain characteristics play an important role in the click rate prediction, for example, users with different genders, ages and identities (parties, outsiders, lawyers and judges) have different tendencies to different information, and the matching degree of the retrieval query of the user and the document keywords also influences the click rate of the legal data.
Correspondingly, in the embodiment of the present application, legal data in the search data are classified by using a K-means algorithm according to the difference of the feature values to obtain K data subsets, and certainly, other data clustering methods may also be used to classify legal data in the search data, which is not limited by the present application. And then, training a click rate prediction model on each data subset, and predicting the click rate by using the trained click rate prediction models together so as to perform click rate prediction evaluation on the legal data retrieval. Here, the plurality of click rate prediction models includes all click rate prediction models or a part of all click rate prediction models.
In particular, in the embodiment of the application, in the case that a click rate prediction model is trained by each category of legal data, a Gradient Boosting Decision Tree (GBDT) model is selected to mine a nonlinear relationship between features, so as to solve the problem that the prediction capability of the existing logistic regression model is limited.
In particular, GBDT is an ensemble learning based non-linear model built, which works on the principle that for each iteration process, a decision tree is newly built in the gradient direction of the reduced residual. The GBDT model can find various distinctive features and feature combinations, and paths of the decision tree can be directly used as the input of different models, so that the feature processing steps are reduced. The structure of the GBDT algorithm is shown in fig. 3, where Tree1 and Tree2 in fig. 3 are two decision trees obtained by the GBDT module, after a sample is input, the two trees are traversed, the sample falls on a leaf node, each node corresponds to a feature of one dimension, after the traversal is completed, all features of the sample can be obtained, and each path of the Tree is subjected to path differentiation by a minimum mean square error segmentation method. For example, for a graph with a first tree containing 3 leaf nodes and a second tree containing 2 leaf nodes, for an input sample falling into leaf node 2 and leaf node 1, a feature vector may be obtained.
It should be understood that, in step S150, the clustering algorithm may cluster the legal data with similar feature values together, and separate the legal data with larger feature value difference to form different data subsets, so that the legal data in the subsets have higher similarity, the judicial information difference between different data subsets is larger, and finally, the click rate prediction models trained on different data subsets have respective characteristics, thereby improving the click rate prediction effect. Meanwhile, the legal data features have a highly nonlinear relationship rather than a simple linear relationship, so that the acquisition of nonlinear representation among the features is the key for improving click rate prediction, a gradient lifting decision tree is used for constructing the nonlinear relationship among the features, and learning models corresponding to the constructed tree are accumulated to realize fitting.
It should be noted that, in order to improve the effectiveness of the retrieval convenience evaluation, the diversified retrieval evaluation, the user satisfaction evaluation, and the click rate prediction evaluation, it is further necessary to identify an abnormal user in the retrieval data of the user collected in step S110, and delete the corresponding retrieval data. Here, the abnormal user identifies a user having abnormal usage legal data retrieval, for example, a user using crawler crawling behavior, for example, a user searches on the legal data retrieval service platform by using "law" as a keyword, and clicks and copies page contents one by one on all search results, and then the behavior can be considered as abnormal user behavior. That is, in the embodiment of the present application, acquiring the retrieval data of a plurality of users in the legal data retrieval includes: identifying an anomalous user of the plurality of users; and deleting the retrieval data of the abnormal user.
In the prior art, in order to quickly and accurately detect an abnormal user, a common solution is as follows: firstly, network data reflecting user behaviors are collected from a network to be trained and learned, user behavior characteristics are obtained, and a classification model is generated; then, real-time data obtained from the network is detected based on the generated classification model, thereby identifying abnormal users. The performance and learning effect of the adopted learning technology directly influence the accuracy of the user behavior detection result, wherein the collaborative learning technology is widely applied due to the fact that a good compromise is obtained between the detection accuracy and the quantity of the labeled training samples, but the collaborative learning technology requires that the training samples are uniform and balanced, which is not suitable for the network user behavior data which is obtained from the network and has obvious imbalance and distribution complexity
Based on this, in the embodiment of the present application, an abnormal user identification method based on selective collaborative learning is adopted. Specifically, a selective ensemble learning technology is introduced into a collaborative learning process, a selective collaborative learning method is provided for generating an abnormal user identification model, and the training learning effect is improved while unbalanced training sample data is used.
FIG. 5 illustrates a flow diagram of a training process for an abnormal user recognition model according to an embodiment of the present application. As shown in fig. 5, the training process of the abnormal user recognition model includes: s210, preprocessing the acquired flow data of the user in the legal data retrieval to obtain a partially labeled training sample, wherein the preprocessing process comprises the steps of counting and measuring the flow data according to the detection characteristic indexes to construct network user behavior data, labeling the partial network user behavior data by using methods such as software tools and manual analysis, and the like; s220, processing the partially marked training samples by using a majority-type distributed sample processing method to generate a plurality of training sample subsets, wherein the processing of the partially marked training samples by using the majority-type distributed sample processing method comprises the steps of obtaining sample distribution conditions and the like by using a feature subspace-based clustering method; s230, generating a plurality of member classifiers by using a mixed perturbation technology; s240, training the member classifier by using a training sample subset; and S250, selecting at least one part of trained member classifiers and constructing an integrated classifier, wherein the integrated classifier is an abnormal user identification model.
Therefore, the evaluation method according to the embodiment of the application further provides an abnormal user detection model, as shown in fig. 4, wherein the model includes two parts, namely selective collaborative learning and abnormal user detection, and the selective collaborative learning part includes: training data preprocessing, namely counting, measuring and marking the collected user behavior flow data to generate a training sample used by a sample processing module; sample processing, namely processing the training samples by using an easy Enssenbel method based on distribution of a plurality of classes to generate a training sample subset; constructing a member classifier, namely, generating the member classifier by using a mixed disturbance technology for subsequent collaborative learning and selective integration; collaborative learning, namely training the member classifier by using an improved collaborative learning method for generating an integrated classifier; selectively integrating, namely screening the member classifier based on accuracy to construct an integrated classifier for detecting abnormal behaviors of the network users; abnormal user detection, including detection data preprocessing, namely, performing statistical measurement on flow data of a user to be detected to generate user behavior data which can be processed by the integrated classifier; and abnormal user identification, namely, classifying the behavior data of the network user by using an integrated classifier, and identifying whether the user is an abnormal user according to the result.
In summary, an evaluation method for legal data retrieval based on the embodiment of the present application is illustrated, which is suitable for performing targeted evaluation on the legal data retrieval to provide an objective index of whether the legal data retrieval meets the user requirement. In particular, the legal data retrieval is used by the user, and includes any entity having the legal data retrieval function, such as a legal data retrieval engine, a packaged legal data retrieval service platform, an application having the legal data retrieval function, and the like, which are not limited by the present application.
Exemplary devices
Fig. 6 illustrates a block diagram of an evaluation device for legal data retrieval according to an embodiment of the present application.
As shown in fig. 6, the evaluation apparatus 600 includes: a retrieval data acquisition unit 610 for acquiring retrieval data of a plurality of user retrieval using legal data; a first evaluation unit 620, configured to perform retrieval convenience evaluation on the legal data retrieval based on the information quantity, information quality, and reliability of the retrieved data; a second evaluation unit 630, configured to perform diversified retrieval evaluation on the legal data retrieval based on a correlation between a retrieval result in the retrieval data and a sub-topic of a retrieval query corresponding to the retrieval result; a third evaluation unit 640, configured to generate a retrieval requirement of a corresponding user based on user data of the retrieval data, and determine whether a retrieval result of the retrieval data matches the retrieval requirement, so as to perform user satisfaction evaluation on the legal data retrieval; and a fourth evaluation unit 650 for classifying legal data in the retrieved data based on a clustering algorithm; respectively training a click rate prediction model according to each type of legal data; and jointly predicting the click rate by using the trained multiple click rate prediction models so as to perform click rate prediction evaluation on the legal data retrieval.
In an example, in the evaluation apparatus 600 for legal data retrieval described above, the retrieved data acquiring unit 610 is further configured to: identifying an anomalous user of the plurality of users; and deleting the retrieval data of the abnormal user.
In an example, in the evaluation apparatus 600 for legal data retrieval described above, the first evaluation unit is further configured to: acquiring a retrieval result of part of the retrieval data; and obtaining the information quantity, the information quality and the reliability of the retrieval data based on the recall ratio and the precision ratio of part of the retrieval results
In an example, in the evaluation apparatus 600 for legal data retrieval described above, the second evaluation unit 630 is further configured to: obtaining at least one of the following evaluation indexes based on the correlation between the retrieval result in the retrieval data and the sub-topic of the retrieval query corresponding to the retrieval result: P-IA index, ERP-IA @ K index, alpha-nDCG index, NRBP index, and rank correlation measure.
In an example, in the evaluation apparatus 600 for legal data retrieval described above, the third evaluation unit 640 is further configured to: generating retrieval requirements of the user based on identity, region and interest in user data
In one example, in the evaluation apparatus 600 for legal data retrieval, an evaluation index system for evaluating the user satisfaction of the legal data retrieval includes four primary indexes of usability, interactivity, information quality and system quality, and the feasibility includes four secondary indexes of easy browsing, easy comprehension, easy usability and interface design; the interactivity comprises two secondary indexes of human-computer interaction and user interaction; the information quality comprises four secondary indexes of accuracy, completeness, authority and completeness; the system quality comprises four secondary indexes of high concurrent access, stability, safety and responsiveness.
In one example, in the evaluation apparatus 600 for legal data retrieval described above, the fourth evaluation unit 650 is further configured to mine a non-linear relationship between features by using a gradient boosting decision tree model.
In an example, in the evaluation apparatus 600 for legal data retrieval described above, further comprising a training unit 660 for training the abnormal user identification model, wherein the training process comprises: preprocessing the acquired flow data of the user in the legal data retrieval to obtain a partially labeled training sample; processing the partially labeled training samples using a majority-class distribution sample processing method to generate a plurality of training sample subsets; generating a plurality of member classifiers using a mixture perturbation technique; training the member classifier respectively by using a training sample subset; and selecting at least one part of trained member classifiers and constructing an integrated classifier, wherein the integrated classifier is an abnormal user identification model.
Here, it can be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above-described evaluation apparatus 600 for legal data retrieval have been described in detail in the above description of the evaluation method for legal data retrieval with reference to fig. 1 to 5, and thus, a repetitive description thereof will be omitted.
As described above, the evaluation apparatus 600 for legal data retrieval according to the embodiment of the present application can be implemented in various terminal devices, such as a large-screen smart device, or a computer independent from a large-screen smart device. In one example, the evaluation apparatus 600 for legal data retrieval according to the embodiment of the present application may be integrated into a terminal device as a software module and/or a hardware module. For example, the evaluation device 600 for legal data retrieval may be a software module in the operating system of the terminal device, or may be an application program developed for the terminal device; of course, the evaluation device 600 for legal data retrieval can also be one of the hardware modules of the terminal device.
Alternatively, in another example, the evaluation device 600 for legal data retrieval and the terminal device may be separate devices, and the evaluation device 600 for legal data retrieval may be connected to the terminal device through a wired and/or wireless network and transmit the interactive information according to the agreed data format.
Exemplary electronic device
Next, an electronic apparatus according to an embodiment of the present application is described with reference to fig. 7.
FIG. 7 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.
As shown in fig. 7, the electronic device 10 includes one or more processors 11 and memory 12.
The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.
Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 11 to implement the evaluation methods for legal data retrieval and/or other desired functions of the various embodiments of the present application described above. Various contents such as legal data can also be stored in the computer-readable storage medium.
In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 13 may include, for example, a keyboard, a mouse, and the like.
The output device 14 can output various information including evaluation indexes to the outside. The output devices 14 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.
Of course, for simplicity, only some of the components of the electronic device 10 relevant to the present application are shown in fig. 7, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the methods and apparatus described above, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the evaluation method of legal data retrieval according to various embodiments of the present application described in the "exemplary methods" section of this specification, supra.
The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in an evaluation method for legal data retrieval according to various embodiments of the present application described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (11)

1. An evaluation method for legal data retrieval, comprising:
acquiring retrieval data of a plurality of user retrieval using legal data;
based on the information data, the information quality and the reliability of the retrieval data, carrying out retrieval convenience evaluation on the legal data retrieval;
performing diversified retrieval evaluation on the legal data retrieval based on the correlation between retrieval results in the retrieval data and the subtopics of the retrieval query corresponding to the retrieval results;
generating a retrieval requirement of a corresponding user based on the user data of the retrieval data, and judging whether the retrieval result of the retrieval data matches the retrieval requirement so as to evaluate the user satisfaction degree of the legal data retrieval; and
classifying the legal data in the retrieval data based on a clustering algorithm, respectively training a click rate prediction model according to each category of legal data, jointly predicting the click rate by using a plurality of trained click rate prediction models, and performing click rate prediction evaluation on the legal data retrieval; the method comprises the following steps of respectively training a click rate prediction model according to legal data of each category, wherein the click rate prediction model comprises the following steps: and mining the nonlinear relation between the features by using a gradient lifting decision tree model.
2. The evaluation method according to claim 1, wherein the evaluation of the retrieval convenience for the retrieval of the legal data based on the information quantity, the information quality and the reliability of the retrieved data comprises:
acquiring a retrieval result of part of the retrieval data; and
and obtaining the information quantity, the information quality and the reliability of the retrieval data based on the recall ratio and the precision ratio of part of the retrieval results.
3. The evaluation method according to claim 1, wherein the diversified search evaluation of the legal data search based on the correlation between the search result in the search data and the subtopic of the search query corresponding to the search result comprises:
obtaining at least one of the following evaluation indexes based on the correlation between the retrieval result in the retrieval data and the sub-topic of the retrieval query corresponding to the retrieval result: P-IA index, ERP-IA @ K index, alpha-nDCG index, NRBP index, and rank correlation measure.
4. The evaluation method according to claim 1, wherein generating a search requirement of a corresponding user based on user data of the search data, and determining whether a search result of the search data matches the search requirement to perform user satisfaction evaluation on the legal data search comprises:
and judging the retrieval requirement of the user based on the identity, the region and the interest in the user data.
5. The evaluation method of claim 4, wherein an evaluation index system for evaluating user satisfaction of the legal data retrieval comprises four primary indexes of usability, interactivity, information quality and system quality, wherein the usability comprises four secondary indexes of easy browsing, easy comprehension, easy usability and interface design; the interactivity comprises two secondary indexes of human-computer interaction and user interaction; the information quality comprises four secondary indexes of accuracy, completeness, authority and completeness; the system quality comprises four secondary indexes of high concurrent access, stability, safety and responsiveness.
6. The evaluation method according to claim 1, wherein the acquiring of the retrieval data of the plurality of users at the retrieval of the legal data comprises:
identifying an anomalous user of the plurality of users; and
and deleting the retrieval data of the abnormal user.
7. The evaluation method of claim 6, wherein identifying abnormal users of the plurality of users comprises:
recognizing abnormal users in the plurality of users by using the trained abnormal user recognition model, wherein the training process of the abnormal user recognition model comprises the following steps:
preprocessing the acquired flow data of the user in the legal data retrieval to obtain a partially labeled training sample;
processing the partially labeled training samples using a majority-class distribution sample processing method to generate a plurality of training sample subsets;
generating a plurality of member classifiers using a mixture perturbation technique;
training the member classifier respectively by using a training sample subset; and
and selecting at least one part of trained member classifiers, and constructing an integrated classifier, wherein the integrated classifier is an abnormal user identification model.
8. An evaluation device for legal data retrieval, comprising:
a retrieval data acquisition unit for acquiring retrieval data for retrieval of legal data used by a plurality of users;
the first evaluation unit is used for evaluating the retrieval convenience of the legal data retrieval based on the information quantity, the information quality and the reliability of the retrieved data;
the second evaluation unit is used for performing diversified retrieval evaluation on the legal data retrieval based on the correlation between the retrieval result in the retrieval data and the subtopic of the retrieval query corresponding to the retrieval result;
the third evaluation unit is used for generating a retrieval demand of a corresponding user based on the user data of the retrieval data, judging whether the retrieval result of the retrieval data matches the retrieval demand or not, and evaluating the user satisfaction degree of the legal data retrieval; and
the fourth evaluation unit is used for classifying legal data in the retrieval data based on a clustering algorithm; respectively training a click rate prediction model according to each type of legal data; predicting the click rate by a plurality of trained click rate prediction models together so as to perform click rate prediction evaluation on the legal data retrieval; the method comprises the following steps of respectively training a click rate prediction model according to legal data of each category, wherein the click rate prediction model comprises the following steps: and mining the nonlinear relation between the features by using a gradient lifting decision tree model.
9. The evaluation apparatus according to claim 8, wherein the retrieval data acquiring unit is further configured to:
identifying an anomalous user of the plurality of users; and
and deleting the retrieval data of the abnormal user.
10. An electronic device, comprising:
a processor; and
a memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform the method of evaluation of legal data retrieval as recited in any one of claims 1-7.
11. A computer readable storage medium having computer program instructions stored thereon which, when executed by a computing device, are operable to perform the method of evaluation of legal data retrieval recited in any one of claims 1-7.
CN202010476485.7A 2020-05-29 2020-05-29 Evaluation method and evaluation device for legal data retrieval and electronic equipment Active CN111612658B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010476485.7A CN111612658B (en) 2020-05-29 2020-05-29 Evaluation method and evaluation device for legal data retrieval and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010476485.7A CN111612658B (en) 2020-05-29 2020-05-29 Evaluation method and evaluation device for legal data retrieval and electronic equipment

Publications (2)

Publication Number Publication Date
CN111612658A CN111612658A (en) 2020-09-01
CN111612658B true CN111612658B (en) 2022-03-01

Family

ID=72196993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010476485.7A Active CN111612658B (en) 2020-05-29 2020-05-29 Evaluation method and evaluation device for legal data retrieval and electronic equipment

Country Status (1)

Country Link
CN (1) CN111612658B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632226B (en) * 2020-12-29 2021-10-26 天津汇智星源信息技术有限公司 Semantic search method and device based on legal knowledge graph and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100570611C (en) * 2008-08-22 2009-12-16 清华大学 A kind of methods of marking of the information retrieval document based on viewpoint searching
US9251276B1 (en) * 2015-02-27 2016-02-02 Zoomdata, Inc. Prioritization of retrieval and/or processing of data
CN106682146B (en) * 2016-12-22 2020-11-20 四川旅投数字信息产业发展有限责任公司 Method and system for retrieving scenic spot evaluation according to keywords
CN107122467B (en) * 2017-04-26 2020-12-29 努比亚技术有限公司 Search engine retrieval result evaluation method and device and computer readable medium

Also Published As

Publication number Publication date
CN111612658A (en) 2020-09-01

Similar Documents

Publication Publication Date Title
US20210109958A1 (en) Conceptual, contextual, and semantic-based research system and method
Schwartz et al. A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses
Hasan et al. Dominance of AI and Machine Learning Techniques in Hybrid Movie Recommendation System Applying Text-to-number Conversion and Cosine Similarity Approaches
AU2022201654A1 (en) System and engine for seeded clustering of news events
EP2160677B1 (en) System and method for measuring the quality of document sets
Hu et al. Identification of highly-cited papers using topic-model-based and bibliometric features: The consideration of keyword popularity
JP2004005668A (en) System and method which grade, estimate and sort reliability about document in huge heterogeneous document set
US20090094223A1 (en) System and method for classifying search queries
JP2004005667A (en) System and method which grade, estimate and sort reliability about document in huge heterogeneous document set
CN104933100A (en) Keyword recommendation method and device
Qiao et al. Construction-accident narrative classification using shallow and deep learning
Basmatkar et al. An overview of contextual topic modeling using bidirectional encoder representations from transformers
CN111612658B (en) Evaluation method and evaluation device for legal data retrieval and electronic equipment
Bachchhav Information retrieval: search process, techniques and strategies
CN115630144B (en) Document searching method and device and related equipment
Xue et al. Topic detection in cross-media: a semi-supervised co-clustering approach
Hasanuzzaman et al. Understanding temporal query intent
Spahiu et al. Topic profiling benchmarks in the linked open data cloud: Issues and lessons learned
CN115935953A (en) False news detection method and device, electronic equipment and storage medium
La Quatra et al. Leveraging full-text article exploration for citation analysis
Maguitman et al. Using topic ontologies and semantic similarity data to evaluate topical search
Fromm et al. Diversity aware relevance learning for argument search
Bochkaryov et al. The use of clustering algorithms ensemble with variable distance metrics in solving problems of web mining
Kang et al. Capturing researcher expertise through mesh classification
JP5720071B2 (en) Compound word concept analysis system, method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant