CN117743171A - Data labeling method for searching and evaluating use cases, storage medium and intelligent device - Google Patents

Data labeling method for searching and evaluating use cases, storage medium and intelligent device Download PDF

Info

Publication number
CN117743171A
CN117743171A CN202311769820.2A CN202311769820A CN117743171A CN 117743171 A CN117743171 A CN 117743171A CN 202311769820 A CN202311769820 A CN 202311769820A CN 117743171 A CN117743171 A CN 117743171A
Authority
CN
China
Prior art keywords
search
data
result
labeling
evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311769820.2A
Other languages
Chinese (zh)
Inventor
闻洪波
谢彦
孟凡博
郑炯辉
曾平
苏希乐
强辉
周康泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weilai Automobile Technology Anhui Co Ltd
Original Assignee
Weilai Automobile Technology Anhui Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weilai Automobile Technology Anhui Co Ltd filed Critical Weilai Automobile Technology Anhui Co Ltd
Priority to CN202311769820.2A priority Critical patent/CN117743171A/en
Publication of CN117743171A publication Critical patent/CN117743171A/en
Pending legal-status Critical Current

Links

Abstract

The application relates to the field of data annotation, in particular to a data annotation method, a storage medium and intelligent equipment for searching and evaluating cases, and aims to solve the problems that a large number of data annotations for searching and evaluating cases depend on manual annotation and are low in efficiency. For this purpose, the data labeling method for the search and evaluation use cases comprises the following steps: acquiring a search evaluation case; initializing and marking the search evaluation use cases based on the selected results to obtain initial marking results; reinitiating a search request by using the request keyword to acquire a plurality of search results; and acquiring the data labeling result of the search evaluation case based on the initial labeling result and the search result. The method is an automatic labeling scheme, and based on the initial labeling result and the searching result of the re-searching, the data labeling result of the searching evaluation case is obtained, so that automatic labeling of a large amount of evaluation data in a short time is realized on the basis of greatly reducing the labor cost.

Description

Data labeling method for searching and evaluating use cases, storage medium and intelligent device
Technical Field
The application relates to the field of data annotation, and particularly provides a data annotation method, a storage medium and intelligent equipment for searching and evaluating cases.
Background
In the prior art, the quality of the searching effect can be roughly judged by counting the selection of the real data of the user during searching. However, there are often a variety of uncertainty factors in the user's selection process, such as random selection, temporary changes in mind, false touches, etc. The real user selection data cannot accurately reflect the real searching intention, so that the real searching index data is needed to be realized by depending on an evaluating system and a large amount of marked evaluating data.
In the process of building a search evaluation system, the true index data is wanted to be reflected, the true search effect is visualized, and the evaluation data is avoided from being marked. The existing labeling work of searching and evaluating data often depends on manual work, and the problems of high labor cost, low labeling speed and low efficiency exist.
Accordingly, there is a need in the art for a new data tagging scheme for search evaluation cases to address the above-described problems.
Disclosure of Invention
In order to overcome the defects, the application provides a data labeling method, a storage medium and intelligent equipment for searching and evaluating cases, so as to solve or at least partially solve the technical problems that the data labeling of a large number of searching and evaluating cases depends on manual labeling and has low efficiency.
In a first aspect, the present application provides a data labeling method for searching and evaluating cases, including:
obtaining a search evaluation case, wherein the search evaluation case at least comprises a request keyword and a selection result which are used when a user initiates a search request;
initializing and marking the search evaluation use cases based on the selection result to obtain an initial marking result;
reinitiating a search request by using the request keyword to acquire a plurality of search results;
and acquiring the data labeling result of the search evaluation case based on the initial labeling result and the search result.
In one technical scheme of the data labeling method for the search evaluation case, the obtaining the data labeling result for the search evaluation case based on the initial labeling result and the search result includes:
and responding to the fact that the ordering of the initial labeling results in the search results does not accord with the preset condition, and labeling and correcting the search evaluation cases based on a text embedding model to obtain data labeling results.
In one technical scheme of the data labeling method for the search and evaluation cases, the text-based embedding model performs labeling correction on the search and evaluation cases to obtain a data labeling result, and the method comprises the following steps:
based on an NLP text embedding model and a cosine similarity algorithm, scoring text similarity of the plurality of search results of the search evaluation case, and respectively obtaining text scores of the search results;
responding to the search results with the text score not lower than a preset threshold value, and judging whether the search results with the text score not lower than the preset threshold value are in a preset first similarity interval or not;
if not, performing annotation correction based on the text score;
if the search result is in the state, further acquiring a relevant factor value of the search result, and performing annotation correction based on the relevant factor value.
In one technical scheme of the data labeling method for the search evaluation use case, the further obtaining the relevant factor value of the search result, and labeling and correcting based on the relevant factor value, includes:
judging whether the related factor value of the search result is in a preset second similarity interval or not;
if not, marking and correcting based on the related factor value;
and if the text score is not lower than the preset threshold value, taking all the search results with the text score not lower than the preset threshold value as data labeling results.
In one technical scheme of the data labeling method for the search and evaluation use cases, the labeling correction based on the relevant factor values includes:
acquiring a search result with an optimal related factor value as a data labeling result;
and/or the number of the groups of groups,
the labeling correction based on the text score includes:
and obtaining the search result with the highest text score as a data labeling result.
In one technical scheme of the data labeling method for the search and evaluation cases, the text-based embedding model performs labeling correction on the search and evaluation cases to obtain a data labeling result, and the method further comprises:
after marking correction, judging whether the ordering of the data marking results in the search results accords with a preset condition;
if not, marking the search evaluation use case as a bad sample.
In one technical scheme of the data labeling method for the search evaluation case, the obtaining the data labeling result for the search evaluation case based on the initial labeling result and the search result further includes:
if the ordering of the initial labeling results in the search results accords with a preset condition, the initial labeling results are used as data labeling results; or,
and if no search result with the text score not lower than the preset threshold value exists, classifying the search evaluation use case as unlabeled.
In one technical scheme of the data labeling method of the search evaluation case, the obtaining the search evaluation case includes:
acquiring search data based on a search request initiated by a user, wherein the search data at least comprises a request keyword and a user selection result;
and carrying out data cleaning and data screening on the search data to obtain a search evaluation use case.
In a second aspect, the present application provides a computer readable storage medium, where a plurality of program codes are stored, where the program codes are adapted to be loaded and executed by a processor to perform the data labeling method for a search and evaluation case according to any one of the technical solutions of the data labeling method for a search and evaluation case.
In a third aspect, the present application provides a smart device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores a computer program, and the computer program when executed by the at least one processor implements the data labeling method for the search and evaluation case according to any one of the technical schemes of the data labeling method for the search and evaluation case.
One or more of the above technical solutions of the present application at least have one or more of the following
The beneficial effects are that:
in the technical scheme of implementing the application, the data labeling method of the search evaluation case is an automatic labeling scheme, the data labeling result of the search evaluation case is obtained based on the initial labeling result and the searching result of the re-search, and the automatic labeling of a large amount of evaluation data in a short time is realized on the basis of greatly reducing the labor cost.
Drawings
The disclosure of the present application will become more readily understood with reference to the accompanying drawings. As will be readily appreciated by those skilled in the art: these drawings are for illustrative purposes only and are not intended to limit the scope of the present application. Moreover, like numerals in the figures are used to designate like parts, wherein:
FIG. 1 is a schematic diagram of the main steps of a method for labeling data of a search and evaluation case according to one embodiment of the present application;
FIG. 2 is a detailed step flow diagram of a method of labeling data of a search evaluation case in accordance with one embodiment of the present application;
FIG. 3 is a block diagram of the main structure of a data tagging device for searching and evaluating cases according to an embodiment of the present application;
FIG. 4 is a main block diagram of a smart device for performing the data tagging method of the search evaluation use case of the present application.
Detailed Description
Some embodiments of the present application are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present application, and are not intended to limit the scope of the present application.
In the description of the present application, a "module," "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, or software components, such as program code, or a combination of software and hardware. The processor may be a central processor, a microprocessor, an image processor, a digital signal processor, or any other suitable processor. The processor has data and/or signal processing functions. The processor may be implemented in software, hardware, or a combination of both. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random access memory, and the like. The term "a and/or B" means all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" has a meaning similar to "A and/or B" and may include A alone, B alone or A and B. The singular forms "a", "an" and "the" include plural referents.
The application provides a data labeling method for searching and evaluating use cases.
Referring to fig. 1, fig. 1 is a schematic diagram of main steps of a data labeling method for searching and evaluating an example according to an embodiment of the present application. As shown in fig. 1, the data labeling method for searching and evaluating cases in the embodiment of the present application mainly includes the following steps S11 to S14.
Step S11, obtaining a search evaluation case, wherein the search evaluation case at least comprises a request keyword and a selection result which are used when a user initiates a search request.
In one embodiment of the present application, the obtaining a search evaluation case includes:
acquiring search data based on a search request initiated by a user, wherein the search data at least comprises a request keyword and a user selection result;
and carrying out data cleaning and data screening on the search data to obtain a search evaluation use case.
The search data are real data acquired based on a search request initiated by a user; the request keyword is obtained based on the words input by the user when the search request is initiated, if the words input by the user are sentences, the sentences are extracted to obtain the request keyword, and if the words input by the user are words, the words are the request keyword; the user selection result is a result selected by the user from returned real search results in the current search process, namely, a result clicked by the user for the first time.
After the search data are processed based on the steps, the search evaluation use cases are obtained, the search evaluation use cases are further required to be subjected to data marking, and the search evaluation use cases after the data marking are completed can be used for model training and evaluation, in particular for evaluating search algorithm models.
Data cleansing and data screening are common data preprocessing steps. Wherein, data cleaning refers to the process of identifying, repairing or deleting errors, deletions, inconsistencies or inaccurate data in a data set, and generally comprises operations of removing duplicate items, filling missing values, correcting data formats, processing abnormal values and the like; data screening is the process of selecting a subset from a dataset that meets certain criteria, by defining and applying screening criteria, data that meets certain criteria or requirements can be extracted from the entire dataset.
And carrying out data cleaning and data screening on the search data to obtain search evaluation cases, wherein a plurality of the search evaluation cases form an evaluation case data set.
FIG. 2 is a detailed flow chart of steps of a method for labeling data of a search evaluation case in accordance with one embodiment of the present application.
Referring to step S201 in fig. 2, search data of a user is collected; and step S202, obtaining a search evaluation use case.
And step S12, initializing and labeling the search and evaluation use cases based on the selection result, and obtaining an initial labeling result.
Specifically, for the evaluation case data set, initializing and labeling is carried out according to a selection result of a user during searching, namely, the selection result is used as an initial labeling result, so that a label is obtained in the initializing and labeling process.
And (3) marking the samples in the data set with corresponding labels to obtain the data labeling result. In the application, the search evaluation use cases are subjected to data labeling, namely, search results conforming to the search intention of a user are labeled in a plurality of search results so as to represent the actual intention of the user for searching.
By marking the data of the search and evaluation cases, the search and evaluation case data set after the data marking can be adopted to provide input and corresponding output for model training, model performance evaluation, algorithm verification and the like. In the fields of machine learning, deep learning and the like, data annotation is an indispensable link.
Referring to step S203 in fig. 2, the label is initialized.
The initialization labeling can be an automatic labeling process without relying on manpower.
And step S13, re-initiating a search request by using the request keyword to acquire a plurality of search results.
In this embodiment, the search uses the request keyword to reinitiate the search request, i.e., to perform search verification. And re-initiating the search request by using the original request keywords of the user, wherein the acquired plurality of search results comprise the initial labeling result.
Referring to step S204 in fig. 2, a search request is initiated.
And step S14, acquiring the data labeling result of the search evaluation case based on the initial labeling result and the search result.
In one embodiment of the present application, the obtaining the data labeling result of the search evaluation case based on the initial labeling result and the search result includes:
and responding to the fact that the ordering of the initial labeling results in the search results does not accord with the preset condition, and labeling and correcting the search evaluation cases based on a text embedding model to obtain data labeling results.
And if the ordering of the initial labeling results in the search results meets the preset conditions, taking the initial labeling results as data labeling results.
In one embodiment, the preset condition refers to that the initial labeling result is ranked first in the search result.
Referring to step S205 in fig. 2, it is determined whether the initial labeling result is ranked first;
if the search and evaluation cases are not ranked first, labeling and correcting are carried out on the search and evaluation cases based on a text embedding model, specifically, step S206 is executed to carry out text similarity scoring; if the first order is set, the initial labeling result is used as a data labeling result, and step S214 is executed to label the data successfully.
Further, the text-based embedding model performs annotation correction on the search and evaluation case to obtain a data annotation result, which includes:
and scoring the text similarity of the plurality of search results of the search evaluation case based on an NLP text embedding model and a cosine similarity algorithm, and respectively obtaining the text score of each search result. The text similarity is a measure for measuring the similarity between two texts, and since the search results themselves comprise texts, text similarity scoring can be directly performed on a plurality of search results of the search evaluation use case, and text scores of each search result can be respectively obtained.
Referring to step S206 in fig. 2, text similarity scoring is performed.
Wherein the NLP text embedding model refers to a text embedding model for Natural Language Processing (NLP) tasks, which is capable of mapping text data to a vector space in order to represent semantic information of text in the vector space.
The cosine similarity algorithm is a method for measuring the similarity between two vectors, the cosine value of an included angle between the two vectors is calculated, the value range is between [ -1,1], the closer the value is to 1, the more similar the two vectors are, the closer the value is to-1, the more dissimilar the two vectors are, and 0 is irrelevant. Can be used in NLP to compare similarity between texts.
If n poi (points of interest) are returned, i.e. n search results, n text scores are respectively corresponding.
And in response to the existence of the search results with the text scores not lower than the preset threshold value, judging whether the search results with the text scores not lower than the preset threshold value are in a preset first similarity interval or not.
If not, performing annotation correction based on the text score;
if the search result is in the state, further acquiring a relevant factor value of the search result, and performing annotation correction based on the relevant factor value.
Further, in an embodiment, the labeling correction based on the text score includes: and obtaining the search result with the highest text score as a data labeling result.
And the search results with the text scores not lower than the preset threshold value show that the text scores are not all too low. If the text score is all too low, the meaning of further labeling and correction is lost; and if the text score is not lower than the search result with the preset threshold value, further labeling and correcting are needed.
In one embodiment, referring to step S207 in fig. 2, it is determined whether the text score is too low; if not, step S208 is performed to determine whether the text score is close. When the text score is not lower than a preset threshold and the search result is in a preset first similarity interval, the text score is considered to be close; otherwise, the text scores are not considered to be close.
If the text score is not close, performing annotation correction based on the text score, and executing step S209 to perform annotation correction based on the search result with the highest text score; and if the text score is close, further acquiring a relevant factor value of the search result, and performing annotation correction based on the relevant factor value.
And if no search result with the text score not lower than the preset threshold value exists, classifying the search evaluation use case as unlabeled.
Referring to fig. 2, if the text score is too low in step S207, step S216 is executed to classify that the text score cannot be marked, and the data marking is finished.
If a search and evaluation case is classified as being unable to be marked, the marking of the case is unsuccessful. The process of marking the unsuccessful use case data is directly finished, and the follow-up process can be used for carrying out follow-up analysis and use by research personnel.
In one embodiment of the present application, the further obtaining the relevant factor value of the search result, performing annotation correction based on the relevant factor value, includes:
judging whether the related factor value of the search result is in a preset second similarity interval or not;
if not, marking and correcting based on the related factor value;
and if the text score is not lower than the preset threshold value, taking all the search results with the text score not lower than the preset threshold value as data labeling results.
Further, the labeling correction based on the related factor value includes: and obtaining the search result with the optimal relevant factor value as a data labeling result.
Specifically, for the search evaluation cases with different request keywords, the types of relevant factors are different, and in an embodiment, if the request keywords of the user are addresses, and the returned search results also include location information, the distance between the search results and the current location of the user is the relevant factor. Accordingly, the closer the distance, the better the correlation factor value, and the farther the distance, the worse the correlation factor value. In other embodiments of the present application, the relevant factors may take other factors, such as time, altitude, water level, etc. Wherein, the related factor value is the specific value of the corresponding related factor.
For example, when the request keyword of the user is a charging station, the returned search result is a plurality of nearby charging stations, and the relevant factor value of one charging station (search result) closest to the current location of the user is optimal.
In an embodiment where the user requests the keyword as the address, please refer to step S210 in fig. 2, determine whether the distance (related factor) is close, that is, when the related factor value of the search result is in the preset second similarity interval, the distance is considered to be close, otherwise, the distance is considered to be not close.
If the distance is not close, executing step S211, and performing annotation correction based on the search result closest to the distance; if the distance is close, all the search results with the text score not lower than the preset threshold value are used as data labeling results, and step S212 is executed to obtain a plurality of labeling results.
In this embodiment, if the distances of the plurality of search results are close, that is, it is considered that no further distinction and labeling correction are needed, the plurality of labeling results can be directly obtained and labeling is considered to be successful.
In one embodiment of the present application, the text-based embedding model performs annotation correction on the search and evaluation case, and obtains a data annotation result, and the method further includes:
after marking correction, judging whether the ordering of the data marking results in the search results accords with a preset condition;
if not, marking the search evaluation use case as a bad sample.
Whether marked as bad samples or not (bad case), the search and evaluation cases are marked with correct labels, and the labels are successful; the search evaluation use case marked as the bad sample can be used for optimizing and improving subsequent analysis of research personnel.
Please refer to fig. 2. Step S213, judging whether the labeling result of the data after labeling correction is ranked at the top; if not, executing step S215, and marking as bad case; after step S215 is executed, step S214 is continued, and the marking is successful. If yes, step S214 is directly executed, and the marking is successful.
And finally, searching and evaluating the funnel analysis model execution of the case automatic labeling. Through the multi-round screening, most use cases realize automatic labeling except that few use cases cannot be automatically labeled, and the bad case is screened out. Meanwhile, the automatic classification of the search evaluation use cases is realized in the process.
Further, the present application provides another embodiment.
In one search scenario, the user search keyword is "the morning blue harbor-replacement station", the first three of the initial search results: the blue sea power station, the blue estuary primary power station and the blue estuary secondary power station. The selection result of the user is the third blue estuary secondary power station.
And obtaining a blue harbor secondary power station as an initial labeling result in the initial labeling.
After search verification is initiated, finding that the initial labeling result is not in the first place, and obtaining text scores of each search result through a text embedding model and a cosine similarity algorithm, wherein the text scores are respectively as follows: 8.53, 9.75, 9.70.
And judging that the text scores are not too low and not close, and finding that the text scores are differentiated and the score of a blue harbor-to-battery exchange station is highest.
The corrected labeling result is "blue harbor generation power exchange station".
And because the corrected labeling result is not searched for the first position any more, the use case is marked as a bad case, and the data labeling is finished.
Based on the steps S11-S14, the data labeling method of the search evaluation case is an automatic labeling scheme, and based on the initial labeling result and the searching result of the re-search, the data labeling result of the search evaluation case is obtained, so that automatic labeling of a large amount of evaluation data in a short time is realized on the basis of greatly reducing the labor cost.
The on-line real search request data is automatically acquired, cleaned and screened, so that a large number of search evaluation cases are converted, and the evaluation cases are classified and initialized and marked based on the selection condition of the user. And then, initiating multiple rounds of keyword search requests for all use cases, performing first-round screening, judging that the use cases with labeling results stably arranged in the first position are labeled successfully, and not needing to participate in next-round screening. And scoring the search results corresponding to the rest use cases according to the field text embedding model and the cosine similarity algorithm. And establishing a funnel analysis model according to the similarity score and other fields affecting search sequencing, and carrying out multi-round screening again to finally realize automatic labeling of the evaluation cases, base case marking, evaluation data classification and invalid evaluation data deletion. It is contemplated that an automated effective labeling of at least 95% of the water level data volume may be addressed.
It should be noted that, although the foregoing embodiments describe the steps in a specific sequential order, it should be understood by those skilled in the art that, in order to achieve the effects of the present application, different steps need not be performed in such an order, and may be performed simultaneously (in parallel) or in other orders, and these variations are within the scope of protection of the present application.
Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.
Further, the application also provides a data labeling device for searching and evaluating the use cases.
Referring to fig. 3, fig. 3 is a main structural block diagram of a data labeling apparatus for searching and evaluating an example according to an embodiment of the present application. As shown in fig. 3, the data labeling device for the search and evaluation cases in the embodiment of the present application mainly includes a search and evaluation case acquisition module 31, an initialization labeling module 32, a search request module 33, and a data labeling module 34. In some embodiments, one or more of the search evaluation case acquisition module 31, the initialization annotation module 32, the search request module 33, and the data annotation module 34 may be combined together into one module. In some embodiments, the search evaluation case acquisition module 31 may be configured to acquire a search evaluation case, where the search evaluation case includes at least a request keyword and a selection result used when a user initiates a search request. The initialization labeling module 32 may be configured to perform initialization labeling on the search evaluation case based on the selection result, and obtain an initial labeling result. The search request module 33 may be configured to reinitiate a search request using the request keywords to obtain a plurality of search results. The data annotation module 34 may be configured to obtain data annotation results for the search evaluation case based on the initial annotation result and the search result.
In one embodiment, the description of the specific implementation functions may be described with reference to step S11-step S14.
The technical principles of the data labeling device for the search evaluation case and the embodiment of the data labeling method for the search evaluation case shown in fig. 1, the solved technical problems and the generated technical effects are similar, and those skilled in the art can clearly understand that, for convenience and brevity of description, the specific working process and related description of the data labeling device for the search evaluation case can refer to the description of the embodiment of the data labeling method for the search evaluation case, which is not repeated herein.
It should be understood that, since the respective modules are merely set to illustrate the functional units of the apparatus of the present application, the physical devices corresponding to the modules may be the processor itself, or a part of software in the processor, a part of hardware, or a part of a combination of software and hardware. Accordingly, the number of individual modules in the figures is merely illustrative.
Those skilled in the art will appreciate that the various modules in the apparatus may be adaptively split or combined. Such splitting or combining of specific modules does not lead to a deviation of the technical solution from the principles of the present application, and therefore, the technical solution after splitting or combining will fall within the protection scope of the present application.
Further, the present application also provides a computer-readable storage medium. In one embodiment of a computer readable storage medium according to the present application, the computer readable storage medium may be configured to store a program for performing the data labeling method for the search evaluation use case of the above method embodiment, where the program may be loaded and executed by a processor to implement the data labeling method for the search evaluation use case.
For convenience of explanation, only those portions relevant to the embodiments of the present application are shown, and specific technical details are not disclosed, refer to the method portions of the embodiments of the present application. The computer readable storage medium may be a storage device including various electronic devices, and optionally, in embodiments of the present application, the computer readable storage medium is a non-transitory computer readable storage medium.
In another aspect of the present application, referring to fig. 4, fig. 4 is a main block diagram of an intelligent device for executing the data labeling method of the search and evaluation case of the present application.
As shown in fig. 4, the smart device 400 may include at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; wherein the memory 402 stores a computer program 403, which computer program 403, when executed by the at least one processor 401, implements the method according to any of the embodiments described above. The intelligent device can be terminal equipment such as a smart phone, a wearable device, a tablet computer, a desktop computer, a notebook computer, a palm computer and the like, and the intelligent device can also be a cloud server. The memory 402 and the processor 401 are illustratively connected via bus communication.
In some embodiments of the present application, the smart device further comprises at least one sensor for sensing information. The sensor is communicatively coupled to any of the types of processors referred to herein.
The processor 401 may be, for example, a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 402 may be an internal storage unit of the smart device 400, for example, a hard disk or a memory of the smart device 400; the memory 402 may also be an external storage device of the Smart device 400, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the Smart device 400. Further, the memory 402 may also include both internal storage units and external storage devices of the smart device 400. The memory 402 is used to store computer programs and other programs and data needed by the smart device 400, and the memory 402 may also be used to temporarily store data that has been output or is to be output.
In some possible implementations, the smart device 400 may include a plurality of processors 401 and memory 402. The program code X03 for executing the data labeling method for the search and evaluation case of the above method embodiment may be divided into a plurality of sub-programs, and each of the sub-programs may be loaded and executed by the processor 401 to execute different steps of the data labeling method for the search and evaluation case of the above method embodiment. Specifically, each of the subroutines may be stored in different memories 402, and each of the processors 401 may be configured to execute the programs in one or more memories 402 to jointly implement the data labeling method for the search evaluation case of the above method embodiment, that is, each of the processors 401 executes different steps of the data labeling method for the search evaluation case of the above method embodiment to jointly implement the data labeling method for the search evaluation case of the above method embodiment.
The plurality of processors 401 may be processors disposed on the same device, for example, the smart device may be a high-performance device composed of a plurality of processors, and the plurality of processors 401 may be processors configured on the high-performance device. In addition, the plurality of processors 401 may be processors disposed on different devices, for example, the smart device may be a server cluster, and the plurality of processors 401 may be processors on different servers in the server cluster.
The smart device 400 may include, but is not limited to, a processor 401 and a memory 402. It will be appreciated by those skilled in the art that fig. 4 is merely an example of a smart device 400 and is not meant to be limiting of the smart device 400, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., a smart device may also include input-output devices, network access devices, buses, etc.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference may be made to related descriptions of other embodiments.
Those of skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed smart device and method may be implemented in other manners. For example, the smart device embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, there may be additional divisions in actual implementation, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow in the methods of the above embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program may implement the steps of the respective method embodiments described above when executed by a processor. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RandomAccess Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.
The personal information of the relevant user possibly related in each embodiment of the application is personal information which is strictly required by laws and regulations, is processed actively provided by the user in the process of using the product/service or is generated by using the product/service and is obtained by authorization of the user according to legal, legal and necessary principles and based on reasonable purposes of business scenes.
The personal information of the user processed by the application may be different according to the specific product/service scene, and the specific scene that the user uses the product/service may be referred to as account information, equipment information, driving information, vehicle information or other related information of the user. The present application treats the user's personal information and its processing with a high diligence.
The method and the device have the advantages that safety of personal information of the user is very important, and safety protection measures which meet industry standards and are reasonable and feasible are adopted to protect the information of the user and prevent the personal information from unauthorized access, disclosure, use, modification, damage or loss.
Thus far, the technical solution of the present application has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present application is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present application, and such modifications and substitutions will be within the scope of the present application.

Claims (10)

1. The data labeling method for searching and evaluating cases is characterized by comprising the following steps of:
obtaining a search evaluation case, wherein the search evaluation case at least comprises a request keyword and a selection result which are used when a user initiates a search request;
initializing and marking the search evaluation use cases based on the selection result to obtain an initial marking result;
reinitiating a search request by using the request keyword to acquire a plurality of search results;
and acquiring the data labeling result of the search evaluation case based on the initial labeling result and the search result.
2. The method according to claim 1, wherein the obtaining the data labeling result of the search evaluation case based on the initial labeling result and the search result includes:
and responding to the fact that the ordering of the initial labeling results in the search results does not accord with the preset condition, and labeling and correcting the search evaluation cases based on a text embedding model to obtain data labeling results.
3. The method according to claim 2, wherein the performing annotation correction on the search and evaluation case based on the text embedding model to obtain a data annotation result comprises:
based on an NLP text embedding model and a cosine similarity algorithm, scoring text similarity of the plurality of search results of the search evaluation case, and respectively obtaining text scores of the search results;
responding to the search results with the text score not lower than a preset threshold value, and judging whether the search results with the text score not lower than the preset threshold value are in a preset first similarity interval or not;
if not, performing annotation correction based on the text score;
if the search result is in the state, further acquiring a relevant factor value of the search result, and performing annotation correction based on the relevant factor value.
4. The method of claim 3, wherein the further obtaining the relevant factor value of the search result, labeling corrections based on the relevant factor value, comprises:
judging whether the related factor value of the search result is in a preset second similarity interval or not;
if not, marking and correcting based on the related factor value;
and if the text score is not lower than the preset threshold value, taking all the search results with the text score not lower than the preset threshold value as data labeling results.
5. The method of claim 4, wherein the labeling correction based on the correlation factor value comprises:
acquiring a search result with an optimal related factor value as a data labeling result;
and/or the number of the groups of groups,
the labeling correction based on the text score includes:
and obtaining the search result with the highest text score as a data labeling result.
6. The method according to any one of claims 3-5, wherein the performing annotation correction on the search and evaluation use case based on the text embedding model to obtain a data annotation result further comprises:
after marking correction, judging whether the ordering of the data marking results in the search results accords with a preset condition;
if not, marking the search evaluation use case as a bad sample.
7. The method according to any one of claims 1-5, wherein the obtaining the data annotation result of the search evaluation case based on the initial annotation result and the search result further comprises:
if the ordering of the initial labeling results in the search results accords with a preset condition, the initial labeling results are used as data labeling results; or,
and if no search result with the text score not lower than the preset threshold value exists, classifying the search evaluation use case as unlabeled.
8. The method according to any one of claims 1-5, wherein the obtaining a search evaluation case comprises:
acquiring search data based on a search request initiated by a user, wherein the search data at least comprises a request keyword and a user selection result;
and carrying out data cleaning and data screening on the search data to obtain a search evaluation use case.
9. A computer readable storage medium having stored therein a plurality of program codes, wherein the program codes are adapted to be loaded and executed by a processor to perform the data tagging method for a search evaluation use case according to any one of claims 1 to 8.
10. An intelligent device, comprising:
at least one processor;
and a memory communicatively coupled to the at least one processor;
wherein the memory stores a computer program which, when executed by the at least one processor, implements the data tagging method for search and evaluation cases according to any one of claims 1 to 8.
CN202311769820.2A 2023-12-20 2023-12-20 Data labeling method for searching and evaluating use cases, storage medium and intelligent device Pending CN117743171A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311769820.2A CN117743171A (en) 2023-12-20 2023-12-20 Data labeling method for searching and evaluating use cases, storage medium and intelligent device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311769820.2A CN117743171A (en) 2023-12-20 2023-12-20 Data labeling method for searching and evaluating use cases, storage medium and intelligent device

Publications (1)

Publication Number Publication Date
CN117743171A true CN117743171A (en) 2024-03-22

Family

ID=90250562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311769820.2A Pending CN117743171A (en) 2023-12-20 2023-12-20 Data labeling method for searching and evaluating use cases, storage medium and intelligent device

Country Status (1)

Country Link
CN (1) CN117743171A (en)

Similar Documents

Publication Publication Date Title
CN110705405B (en) Target labeling method and device
CN105279277A (en) Knowledge data processing method and device
CN112163424A (en) Data labeling method, device, equipment and medium
CN110516259B (en) Method and device for identifying technical keywords, computer equipment and storage medium
CN112613569A (en) Image recognition method, and training method and device of image classification model
CN112328805A (en) Entity mapping method of vulnerability description information and database table based on NLP
JP2022185143A (en) Text detection method, and text recognition method and device
CN111898555B (en) Book checking identification method, device, equipment and system based on images and texts
CN113936232A (en) Screen fragmentation identification method, device, equipment and storage medium
CN113205046A (en) Method, system, device and medium for identifying question book
CN115203758B (en) Data security storage method, system and cloud platform
CN112529629A (en) Malicious user comment brushing behavior identification method and system
CN117743171A (en) Data labeling method for searching and evaluating use cases, storage medium and intelligent device
CN116629606A (en) Industrial chain early warning method, device, equipment and medium based on power data
CN110083807B (en) Contract modification influence automatic prediction method, device, medium and electronic equipment
CN109446330B (en) Network service platform emotional tendency identification method, device, equipment and storage medium
CN110909538A (en) Question and answer content identification method and device, terminal equipment and medium
CN111783786A (en) Picture identification method and system, electronic equipment and storage medium
CN112784903B (en) Method, device and equipment for training target recognition model
CN112989823B (en) Log processing method, device, equipment and storage medium
CN114610985B (en) Information extraction method and device, electronic equipment and storage medium
US11227186B2 (en) Method and device for training image recognition model and related device
CN113434760B (en) Construction method recommendation method, device, equipment and storage medium
CN114707911B (en) Cross-border electronic commerce information risk analysis method and server combined with cloud computing
CN114357005A (en) Method and device for generating scientific information, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination