CN114444509B

CN114444509B - Method, device and equipment for testing named entity recognition model and storage medium

Info

Publication number: CN114444509B
Application number: CN202210343812.0A
Authority: CN
Inventors: 周磊
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-02
Filing date: 2022-04-02
Publication date: 2022-07-12
Anticipated expiration: 2042-04-02
Also published as: CN114444509A

Abstract

The application provides a test method, a device, equipment and a storage medium for a named entity recognition model; relates to artificial intelligence and automated testing, and the method comprises the following steps: acquiring a text sample set and a named entity identification model to be tested; performing disturbance processing on the text sample set based on at least one disturbance mode to obtain a first number of evaluation text sets; calling a named entity recognition model to respectively recognize the text sample sets and the evaluation text sets of the first number to obtain a first recognition result corresponding to each evaluation text set and a second recognition result corresponding to the text sample sets; determining the recognition success rate of the named entity recognition model based on the probability that the first recognition result is the same as the second recognition result; and determining the robustness of the named entity recognition model based on the recognition success rate, wherein the recognition success rate is positively correlated with the robustness. Through the method and the device, the accuracy and the test efficiency of the model test can be improved.

Description

Method, device and equipment for testing named entity recognition model and storage medium

Technical Field

The present application relates to artificial intelligence and automation testing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for testing a named entity recognition model.

Background

With the rapid development of artificial intelligence technology, artificial intelligence has penetrated into various fields of various industries, and artificial intelligence algorithms, in particular to deep learning models (for example, named entity recognition models), the deep learning models are mainly driven by data, and the training process of the deep learning models is based on prior data hypothesis; however, when the actual model is used online, the real data is noisy compared to the training data. When the noise is large, the named entity recognition model may not be well processed, before the named entity recognition model is put into use, the detection precision of the named entity recognition model and the robustness of the model need to be detected, and a related technology has no scheme for evaluating the robustness of the named entity recognition model with high accuracy.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for testing a named entity recognition model and a computer readable storage medium, which can improve the accuracy and the efficiency of testing the named entity recognition model.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a test method of a named entity recognition model, which comprises the following steps:

acquiring a text sample set and a named entity identification model to be tested;

performing disturbance processing on the text sample set based on at least one disturbance mode to obtain a first number of evaluation text sets, wherein each disturbance mode corresponds to at least one evaluation text set;

calling the named entity recognition model to respectively recognize the text sample sets and the evaluation text sets of the first number to obtain a first recognition result corresponding to each evaluation text set and a second recognition result corresponding to the text sample sets;

determining the probability that the first recognition result is the same as the second recognition result, and determining the recognition success rate of the named entity recognition model based on the probability;

determining robustness of the named entity recognition model based on the recognition success rate, wherein the recognition success rate is positively correlated with the robustness.

The embodiment of the application provides a testing device for a named entity recognition model, which comprises:

the system comprises a sample acquisition module, a named entity identification module and a text sample collection module, wherein the sample acquisition module is used for acquiring a text sample set and a named entity identification model to be tested;

the sample perturbation module is used for perturbing the text sample set based on at least one perturbation mode to obtain a first number of evaluation text sets, wherein each perturbation mode corresponds to at least one evaluation text set;

the model testing module is used for calling the named entity recognition model to respectively recognize the text sample sets and the evaluation text sets of the first number to obtain a first recognition result corresponding to each evaluation text set and a second recognition result corresponding to each text sample set;

the model testing module is further configured to determine a probability that the first recognition result is the same as the second recognition result, and determine a recognition success rate of the named entity recognition model based on the probability;

the model testing module is further configured to determine robustness of the named entity recognition model based on the recognition success rate, wherein the recognition success rate is positively correlated with the robustness.

An embodiment of the present application provides an electronic device, which includes:

a memory for storing executable instructions;

and the processor is used for realizing the testing method of the named entity recognition model in the embodiment of the application when the executable instructions stored in the memory are executed.

The embodiment of the present application provides a computer-readable storage medium, which stores executable instructions, and when the executable instructions are executed by a processor, the computer-readable storage medium implements the method for testing the named entity recognition model according to the embodiment of the present application.

The embodiment of the present application provides a computer program product, which includes a computer program or instructions, and when the computer program or instructions are executed by a processor, the computer program or instructions implement the test method for the named entity recognition model according to the embodiment of the present application.

The embodiment of the application has the following beneficial effects:

the method comprises the steps of disturbing a sample text set based on at least one disturbing mode to obtain at least one evaluation text set comprising disturbed texts, calling a named entity recognition model to respectively recognize the sample text set and the at least one evaluation text set, determining the probability that the output results of the named entity recognition model to the sample text set and the evaluation text set are the same, and determining the robustness of the named entity recognition model based on the probability. The evaluation text set is generated in a disturbance mode, so that the workload required for marking the sample is reduced, the computing resource is saved, the testing efficiency is improved, and the testing accuracy is improved.

Drawings

FIG. 1 is a schematic diagram of an application mode of a testing method for a named entity recognition model according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an electronic device 400 provided in an embodiment of the present application;

FIG. 3 is a first flowchart of a testing method for a named entity recognition model according to an embodiment of the present disclosure;

FIG. 4 is a second flowchart of a testing method for a named entity recognition model according to an embodiment of the present disclosure;

FIG. 5 is a third flow chart of a testing method for a named entity recognition model according to an embodiment of the present disclosure;

FIG. 6 is a fourth flowchart illustrating a testing method for a named entity recognition model according to an embodiment of the present disclosure;

fig. 7A is a fifth flowchart illustrating a testing method for a named entity recognition model according to an embodiment of the present disclosure;

FIG. 7B is a sixth flowchart illustrating a testing method for a named entity recognition model according to an embodiment of the present application;

fig. 7C is a seventh flowchart illustrating a testing method of a named entity recognition model according to an embodiment of the present application;

fig. 8 is an eighth flowchart illustrating a testing method for a named entity recognition model according to an embodiment of the present disclosure;

fig. 9 is a ninth flowchart illustrating a testing method for a named entity recognition model according to an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.

It should be noted that, in the embodiment of the present application, the information related to the user, the feedback data related to the user, and the like, when the embodiment of the present application is applied to a specific product or technology, user permission or consent needs to be obtained, and the collection, use, and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) And (3) text disturbance processing, namely processing noise added in the text, wherein the disturbed text conforms to a natural language scene. Perturbation processing includes character addition, deletion, modification, etc. on a piece of text, such as: the original text "she is a young person" is subjected to disturbance processing for deleting partial characters (deleting the character "one"), resulting in a disturbed text "she is a young person".

2) The robustness of the model is the robustness of the model, i.e. whether the model can still maintain the accuracy of the judgment when the model is subjected to small changes of the input data, i.e. whether the model is stable in performance when the model is subjected to changes within a certain range. Robustness, also called robustness, is robustness of the capability of artificial intelligence, which means the degree that the recognition effect of the model on the input data is not affected when the input data is disturbed or changed.

3) Named Entity Recognition (NER), also called "proper name Recognition", refers to Recognition of entities with specific meaning in text, mainly including names of people, places, organizations, proper nouns, etc.

4) White-box model, the internal parameter structure is a knowable model. The white-box model differs from the black-box model in that the parametric structure of the black-box model is unknown, and its effect can only be inferred from the output of the model.

5) Against the sample, some perturbation imperceptible to the human eye is added to the original sample (such perturbation does not affect human recognition, but easily fools the model), causing the machine to make false judgments. In the embodiment of the application, a text sample is subjected to disturbance processing to obtain a disturbance text, wherein the disturbance text is a countermeasure sample; and inputting the disturbance text into the model to be detected for identification, wherein the disturbance text can be regarded as resisting against the model.

Embodiments of the present application provide a named entity recognition model method, a named entity recognition model apparatus, an electronic device, a computer-readable storage medium, and a computer program product, which can improve accuracy and efficiency of performance testing of a named entity recognition model, and further adjust and improve performance of the named entity recognition model based on a detection result. The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent traffic, driving assistance and the like. The embodiment of the present application may be implemented by a server, or implemented by a terminal device and the server in a cooperative manner, and an exemplary application when the electronic device is implemented as the server will be described below.

Referring to fig. 1, fig. 1 is a schematic diagram of an application mode of a test of a named entity recognition model provided in an embodiment of the present application; by way of example, reference is made to a server comprising: the identification server 202, the sample perturbation server 201, the network 300 and the terminal device 401. The identification server 202 and the sample perturbation server 201 are communicated through the network 300, or are communicated through other modes, the terminal device 401 is connected with the identification server 202 through the network 300, and the network 300 can be a wide area network or a local area network, or a combination of the two.

Illustratively, a named entity identification model to be tested is stored in the identification server 202, a user is a technician who tests the named entity identification model, the user sends test configuration information corresponding to the named entity identification model to the sample perturbation server 201 through the network 300, the sample perturbation server 201 obtains a sample text set based on the test configuration information and conducts perturbation processing based on the sample text set to obtain a plurality of evaluation text sets, and sends the sample text set and the plurality of evaluation text sets to the identification server 202 for identification processing, the named entity identification model in the identification server 202 outputs an identification result to the sample perturbation server 201, the sample perturbation server 201 determines robustness of the named entity identification model based on the identification result, and sends the test result to the terminal device 401 of the user. Further, the named entity recognition model can be refined by the technician based on the test results.

In some embodiments, the recognition server 202 may be a server of a car networking platform, and the named entity recognition model may be applied in an artificial intelligence language recognition service of a car terminal, for example: the user speaks the place name to the vehicle-mounted terminal, the vehicle-mounted terminal sends the place name to the recognition server 202, the recognition server 202 recognizes the place name by using the named entity recognition model, the path information corresponding to the place name is obtained, and the path information is presented to the user through the vehicle-mounted terminal, so that navigation service is provided for the user. Or the vehicle-mounted terminal independently completes the identification processing by using the named entity identification model.

In some embodiments, the identification server 202 and the sample perturbation server 201 may also be implemented as a unified server.

The embodiment of the application can be realized through a block chain technology, an abnormal account obtained by the testing method of the named entity recognition model of the embodiment of the application can be used as a detection result, the detection result is uploaded to a block chain to be stored, and the reliability of the detection result is ensured through a consensus algorithm. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The embodiment of the application can be realized by a Database technology, wherein a Database (Database) can be regarded as a place where an electronic file is stored in an electronic file cabinet in short, and a user can add, query, update, delete and the like to data in the file. A "database" is a collection of data that is stored together in a manner that can be shared by multiple users, has as little redundancy as possible, and is independent of the application.

A Database Management System (DBMS) is a computer software System designed for managing a Database, and generally has basic functions such as storage, interception, security assurance, and backup. The database management system may classify the database according to the database model it supports, such as relational, XML (Extensible Markup Language); or classified according to the type of computer supported, e.g., server cluster, mobile phone; or classified according to the Query Language used, such as Structured Query Language (SQL), XQuery; or by performance impulse emphasis, e.g., maximum size, maximum operating speed; or other classification schemes. Regardless of the manner of classification used, some DBMSs are capable of supporting multiple query languages across categories, for example, simultaneously.

In some embodiments, the server may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited thereto.

The embodiment of the application can also be realized by a Cloud Technology, and the Cloud Technology (Cloud Technology) can form a resource pool based on the general names of a network Technology, an information Technology, an integration Technology, a management platform Technology, an application Technology and the like applied in a Cloud computing business model, and can be used as required, so that the Cloud computing business model is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry and the promotion of requirements of search services, social networks, mobile commerce, open collaboration and the like, each article may have a hash code identification mark, the hash code identification mark needs to be transmitted to a background system for logic processing, data at different levels can be processed separately, and data in various industries need strong system background support and can only be realized through cloud computing.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present application, including: at least one processor 410, memory 450, at least one network interface 420. The various components in electronic device 400 are coupled together by a bus system 440. It is understood that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in FIG. 2.

The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 450 optionally includes one or more storage devices physically located remote from processor 410.

The memory 450 includes either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 450 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 450 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

The operating system 451, which includes system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., is used for implementing various basic services and for processing hardware-based tasks.

A network communication module 452 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), among others.

In some embodiments, the testing apparatus for named entity recognition model provided in the embodiments of the present application may be implemented in software, and fig. 2 illustrates the testing apparatus 455 for named entity recognition model stored in the memory 450, which may be software in the form of programs and plug-ins, and includes the following software modules: a sample acquisition module 4551, a sample perturbation module 4552 and a model test module 4553, which are logical and thus may be arbitrarily combined or further split depending on the functions implemented. The functions of the respective modules will be explained below.

The test method of the named entity recognition model provided by the embodiment of the present application will be described in conjunction with exemplary applications and implementations of the terminal provided by the embodiment of the present application.

Referring to fig. 3, fig. 3 is a first flowchart of a testing method for a named entity recognition model according to an embodiment of the present application, which will be described with reference to steps 101 to 105 shown in fig. 3.

In step 101, a set of text samples and a named entity recognition model to be tested are obtained.

For example, the content of the text sample in the text sample set may be determined according to the application scenario of the named entity recognition model to be tested, for example: the named entity model is applied to the voice recognition service of the vehicle-mounted terminal and is mainly used for recognizing the place name. The content of the text sample in the text sample set may be location name related content. The named entity model is applied to a voice recognition service of smart home and is mainly used for recognizing a person name (for example, a user name) and an object name (for example, a household appliance name), and the content of the text sample in the text sample set can be content related to the person name and the object name.

Illustratively, the set of text samples includes a plurality of text samples, each text sample being pre-labeled, and the labeling labels include: the text type, the entity name of each entity word corresponding to the text sample, the number of characters corresponding to the text sample, the start of the entity word corresponding to the text sample, the type of the entity word (the type includes a place class, a name of a person, a date, etc.), and the like.

In step 102, a text sample set is disturbed based on at least one disturbance mode to obtain a first number of evaluation text sets.

Here, each of the disturbance modes corresponds to at least one evaluation text set.

For example, the types of the perturbation modes include an entity perturbation mode and a random perturbation mode; random disturbance mode: randomly selecting words from the text sample for disturbance treatment; an entity disturbance mode: and respectively adopting different perturbation processing on the entity words and the non-entity words in the text sample based on the types of the words (the entity words and the non-entity words). The overall disturbance processing procedure for different disturbance modes is different, for example: in the disturbance mode 1, the following processing is sequentially performed on the text: deleting characters, replacing synonyms and adding characters; and (2) a disturbance mode 2, sequentially performing the following processing on the text: adding a mask in the text and exchanging the positions of adjacent words in the text. The disturbance mode 1 is different from the disturbance mode 2.

For example, each perturbation text in the evaluation text set corresponds to a text sample in the text sample set in a one-to-one manner.

In the embodiment of the application, when at least one disturbance mode is a plurality of different disturbance modes, the first number can be set to be a larger value (for example: 10000 or more), and the robustness of the model can be determined by using the difference between the identification results of a large number of evaluation text sets in the dimension of the text sets; when at least one disturbance mode is an entity disturbance mode, the first number can be a numerical value larger than or equal to 1, each disturbance text in the evaluation text set can be used for carrying out multiple countermeasures on the named entity recognition model (calling the named entity recognition model to carry out multiple recognition processing on the disturbance text), and the robustness of the model is determined according to the difference between the text sample dimension recognition results.

In some embodiments, referring to fig. 4, fig. 4 is a second flowchart of the testing method for the named entity recognition model provided in the embodiment of the present application, and step 102 may be implemented through steps 1021 to 1022, which are described in detail below.

In step 1021, the following is performed for each of at least one perturbation mode: and carrying out disturbance processing on the text sample set based on the disturbance mode to obtain at least one evaluation text set corresponding to the disturbance mode.

For example, at least one evaluation text set may be obtained for each perturbation mode, and the number of the evaluation text sets corresponding to each perturbation mode may be different.

In some embodiments, when the type of the perturbation mode is a random perturbation mode, step 1021 may be implemented by: performing the following processing on the text sample set at least once to obtain at least one evaluation text set: performing the following processing on each text sample in the text sample set: selecting at least one target word from the text sample, and performing disturbance processing on each target word to obtain a disturbance text; and combining each perturbation text into an evaluation text set.

Here, each perturbation text in the evaluation text set corresponds to each text sample in the text sample set in a one-to-one manner.

For example, at least one target word may be randomly selected from the text sample, and the perturbation process for each target word may be performed concurrently without interfering with the perturbation process for other target words. For example: the text sample is' the weather today is very good, and is suitable for going out. "select target word" today "," go out ", under the condition that disturbance processing does not interfere with each other, can carry out disturbance processing to above-mentioned target word simultaneously, carry out synonym replacement and increase the mask to" going out "to" today ". The obtained disturbance text' the weather of this day is good, and is suitable for going out of the mask. ".

For example, the perturbation processing for each target word may be implemented in at least one of the following ways.

(1) Adding a mask between the target word and the neighbor word of the target word; for example: after the "do you happy today" processing, the "do you [ mask ] [ mask ] happy today" is processed.

(2) Predicting the next word of the target word, and adding the predicted word to the target word; for example: the "do you happy today" is processed into "do you happy today" and "do you really happy today".

(3) Replacing the target word with a synonym of the target word; for example: the 'I happy' treatment is 'I happy'.

(4) Exchanging positions of the target word and the neighbor words of the target word in the text sample; for example: "does you eat" is treated as "does you eat".

(5) Increasing or decreasing characters in the target word; for example: "a certain bridge" is treated as "a certain bridge" and "a certain bridge", where "a certain" refers to the name of a bridge.

(6) And replacing the target word with other corresponding characters. For example: characters are replaced by numbers, and the processing of 'one month and one day' is '1 month and 1 day'.

In some embodiments, when the perturbation mode is a physical perturbation mode, step 1021 may be implemented by: performing the following processing on the text sample set at least once to obtain at least one evaluation text set: performing the following processing on each text sample in the text sample set: acquiring entity words and non-entity words in a text sample, selecting at least one target word of a head from a descending ordering result of the importance index of each non-entity word to perform perturbation processing, and performing perturbation processing on each entity word based on the word meaning of each entity word to obtain a perturbed text; and combining each perturbation text into an evaluation text set, wherein each perturbation text in the evaluation text set corresponds to each text sample in the text sample set one by one.

For example, the importance index of the non-entity word is used for representing the importance of the non-entity word in the text sample. And deleting the non-entity words from the text sample to obtain candidate sentences corresponding to the non-entity words, wherein if the meaning difference between the candidate sentences and the text sample is larger, the importance indexes of the non-entity words in the text sample are obtained. Otherwise, the importance index is lower. Taking the non-entity word as the target word, reference may be made to the above for the perturbation processing of the target word, which is not described herein again.

For example, the number of the target words at the head is selected from the descending order result of the importance index of each non-entity word, and may be determined based on the number of the non-entity words in the text sample, where the number of the non-entity words is positively correlated with the number of the selected target words.

Illustratively, the perturbation processing for each entity word may be implemented by at least one of the following processes.

(1) And under the condition of not influencing the meaning of the entity word, increasing or decreasing the characters of the entity word based on the word meaning of the entity word. For example: the text sample is 'you have several shirts', the shirts are entity words, and the text sample is subjected to word-adding disturbance processing to obtain 'you have several shirts'.

(2) And acquiring synonyms of the entity words, and replacing the entity words with the synonyms. The entity words usually exist in terms of similar meanings (e.g., abbreviation, common name, external number of human name, etc.), such as: the text sample is 'the age of a person this year', the entity word is 'a person', the 'person' is replaced by a similar word, and the 'the age of the person this year' is obtained after disturbance processing.

In some embodiments, before step 1021, referring to fig. 5, fig. 5 is a third flowchart of a testing method for a named entity recognition model provided in an embodiment of the present application, and an importance indicator of a non-entity word may be obtained through steps 501 to 503.

In step 501, a word segmentation process is performed on the text sample to obtain a word segmentation result of the text sample.

For example, for ease of explanation, the following is illustrated with the text sample "do you know which street a building is on". The text sample is participated to obtain 'you, know, certain building, where, street, and Dou'. Wherein "know, where, which, do" is a non-entity word.

In step 502, the following is performed for each non-entity word: deleting the non-entity words from the word segmentation result of the text sample to obtain candidate sentences corresponding to the non-entity words; and carrying out named entity recognition on each candidate sentence and the text sample to obtain a recognition result of each candidate sentence and a recognition result of the text sample.

For example, continuing the description of the text sample, the following candidate sentences may be obtained by sequentially deleting the non-entity words from the word segmentation result of the text sample.

Candidate sentence 1, "you, certain building, where, street, Dow";

candidate sentence 2, "you, know, certain building, which, street, do";

candidate sentence 3, "you, know, certain building, in, street, do";

candidate sentence 4, "you, know, certain building, in, which, street";

and calling a named entity identification model to be detected to respectively identify the candidate sentences and the text samples to obtain identification results corresponding to each candidate sentence and each text sample. Or the candidate sentences and the text samples can be respectively identified through a white-box model of the internal parameter structure of the model, so that corresponding identification results are obtained.

In step 503, a difference value between the recognition result of each candidate sentence and the recognition result of the text sample is obtained, and the importance index of the non-entity word corresponding to the candidate sentence is determined based on the difference value.

Here, the difference value is positively correlated with the importance indicator of the non-entity word in the text sample.

Illustratively, the recognition result includes a number of named entities. And the difference between the recognition results is the difference between the number of named entities corresponding to the candidate sentences and the text samples, and the difference is used for representing the importance index. Correspondingly, the higher the difference value is, the higher the importance of the candidate sentence is compared with the non-entity word missing from the text sample, and the greater the influence of missing the entity word on the meaning of the text sample is. Assuming that the recognition result obtained by the original text sample is 5 named entities, the recognition result of the candidate sentence N (corresponding to the non-entity word N) is 4 named entities, and the difference is 1; the recognition result of the candidate sentence M (corresponding to the non-entity word M) is 3 entity words, and the difference value is 2; the non-entity word m is more important than the non-entity word n. And performing descending sorting on the difference value corresponding to each non-entity word, selecting at least one non-entity word as a target word based on a descending sorting result, and performing disturbance processing on the target word.

In step 1022, text quality detection is performed on at least one evaluation text set corresponding to each perturbation mode, and a first number of evaluation text sets is determined based on the obtained text quality detection result.

Illustratively, the text quality detection result comprises disturbance parameters corresponding to the evaluation text sets, when the disturbance parameters do not meet the corresponding threshold values, the evaluation text sets with the disturbance parameters not meeting the corresponding threshold values are deleted, the evaluation text sets meeting the corresponding threshold values are reserved, and the first number of evaluation text sets are obtained.

In some embodiments, the first number may be a preset number, and when the first number of evaluation text sets is acquired, step 103 is performed. And if the number of the evaluation text sets is smaller than the first number, continuing to perform disturbance processing on the text sample set to obtain the evaluation text set.

In some embodiments, referring to fig. 6, fig. 6 is a fourth flowchart illustrating a testing method of a named entity recognition model provided in the embodiment of the present application, and step 1022 may be implemented through step 601 to step 604, which is described in detail below.

In step 601, when the type of the perturbation mode is an entity perturbation mode, determining a perturbation parameter of each perturbation text in each evaluation text set, wherein the perturbation parameter includes at least one of the following: entity perturbation rate, non-entity perturbation rate and text confusion.

For example, the text disturbance rate is the ratio of the number of characters in the text subjected to disturbance processing to the number of original characters of the text. The entity disturbance rate is the disturbance rate of entity words in the text, and is the ratio of the number of characters subjected to disturbance processing in the text to the number of original characters of the text; the non-entity disturbance rate is a disturbance rate aiming at non-entity words in the text, and is the ratio of the number of characters of the non-entity words subjected to disturbance processing in the text to the number of original characters of the text.

The number of characters contained in an original text sample is assumed to be 100 characters, wherein the number of characters of an entity word is 10, and the number of characters of a non-entity word is 20; after the disturbance processing is performed, 15 characters are disturbed, and the disturbance rate of the disturbed text is 15%.

In an example, the text confusion is an evaluation index of a language model, and is used for evaluating the naturalness of a perturbed text in the embodiment of the application, and if the perturbed text does not conform to the natural language expression, it indicates that the noise added in the perturbation processing process is too large and is not suitable for evaluation. Suppose that a piece of text is characterized as

，

Is the first word in the text, k is the number of words in the text,

is the kth word in the text, the text confusion

Can be expressed as the following formula (1).

Wherein,

is the i-th word in the text,

is the distribution of the probability that the ith word in the text may occur. The lower the confusion, the more the text conforms to the natural language expression, and the higher the probability that the named entity model can identify the named entity corresponding to the text.

In step 602, when the type of the perturbation mode is a random perturbation mode, determining a perturbation parameter of each perturbation text in each evaluation text set, wherein the perturbation parameter includes at least one of the following: text disturbance rate, text editing distance, text confusion.

For example, the text disturbance rate and the text confusion rate refer to the above explanation, and are not described herein again. The text editing distance specifically refers to an editing distance between an original text sample and a perturbation text, and the editing distance may be characterized as the number of times of performing an editing operation (in this embodiment, perturbation processing). For example: the original sample text contains 10 words, and disturbance processing is performed once on each word to obtain a disturbance text, so that the editing distance between the disturbance text and the original sample text is 10.

In step 603, the evaluation text set corresponding to the disturbance text with the disturbance parameter greater than the disturbance parameter threshold is deleted.

In an example, the disturbance parameter is greater than the disturbance parameter threshold, which indicates that the disturbance text has more noise and is not suitable for being used as a sample for testing.

For example, different perturbation parameter thresholds may be set for different perturbation parameters. For example: for the text disturbance rate, the corresponding text disturbance rate threshold may be 15% to avoid excessive noise in the disturbed text. For the non-entity disturbance rate and the entity disturbance rate, the sum of the two can be less than or equal to 15%; alternatively, the threshold values are set separately for both. The editing distance threshold value can be set based on the length of the text, the text length is positively correlated with the editing distance threshold value, and the higher the text length is, the higher the corresponding editing distance threshold value is. The method can be uniformly set for the text confusion, and when the text confusion is higher than a text confusion threshold, the disturbed text is not in line with the natural language expression and the noise is too large.

In some embodiments, when at least one perturbed text in the evaluation text set is detected, wherein the perturbation parameter of the perturbed text is greater than the perturbation parameter threshold, then the perturbation parameters of other perturbed texts not yet detected in the evaluation text set may also be too large, and then the evaluation text set is deleted. By deleting the evaluation text set which does not meet the test requirement, the accuracy of the test sample is improved, and meanwhile, the calculation resources required by text quality detection are saved.

In some embodiments, the perturbation texts with perturbation parameters larger than the perturbation parameter threshold value can be deleted from the evaluation text set, the perturbation texts with perturbation parameters smaller than the perturbation parameter threshold value are reserved, the corresponding perturbation texts with different sample texts are selected to be combined into the evaluation text set for testing, and the combined evaluation text set is reserved. By combining the evaluation text sets, the computing resources are saved, and the generated evaluation text sets to be screened are avoided.

In step 604, in at least one evaluation text set corresponding to each disturbance mode, the number of the evaluation text sets which are not deleted is counted, and each disturbance text in the evaluation text sets which are not deleted is labeled based on a disturbance parameter corresponding to each disturbance text, so that a first number of evaluation text sets is obtained.

Illustratively, the first number is obtained by counting the number of the evaluation text sets that are not deleted. The first number may be a larger number (e.g., 10)⁴、10⁶) Meanwhile, the evaluation text sets of the first number can be labeled based on the quality detection result, time and manpower consumed in the manual labeling process can be saved, and labeling of each disturbance text in the evaluation text sets can include: sample texts corresponding to the perturbation texts, editing distances of the perturbation texts, perturbation rates, entity perturbation rates, non-entity perturbation rates, text confusion degrees, entity words and non-entity words in the perturbation texts and the like. And obtaining a first number of evaluation text sets by labeling each disturbance text.

In the embodiment of the application, the quality of the test and evaluation text set is controlled by text quality detection, so that the test accuracy of the named entity recognition model is improved. By means of obtaining the evaluation text set firstly and then screening the evaluation set, the calculation amount required in the disturbance processing process is reduced, the generation efficiency of the evaluation text set is improved, the quality of the evaluation text set in the testing process is guaranteed through the screening set, and the testing accuracy is improved.

In some embodiments, referring to fig. 7A, fig. 7A is a fifth flowchart illustrating a testing method of a named entity recognition model provided in an embodiment of the present application, before step 102, at least one perturbation mode for testing the named entity recognition model may be obtained through the following steps 701A to 704A.

In step 701A, a plurality of disturbance modes to be screened are obtained.

For example, the perturbation mode may be obtained by obtaining multiple perturbation processes and randomly combining the multiple perturbation processes, for example: obtaining various disturbance treatments (such as synonym replacement, mask addition, word deletion, word addition, position exchange, character replacement and the like), randomly selecting at least one disturbance treatment from the disturbance treatments, and randomly sequencing the treatment sequence of the disturbance treatments (when only one disturbance treatment is needed, sequencing is not needed), so as to obtain various disturbance modes to be screened, wherein the number of the various disturbance modes to be screened can be 30.

In step 702A, a text sample set is perturbed based on multiple perturbation modes to obtain a second number of perturbed text sets.

Here, each perturbation mode corresponds to at least one perturbation text set.

For example, the second number may be much smaller than the first number above, for example: the second number may be 300. On the assumption that the text sample set is disturbed based on the 30 disturbance modes to obtain 300 disturbed text sets, each disturbance mode may correspond to at least one disturbed text set, the number of the disturbed text sets corresponding to each disturbance mode may be the same or different, or the number of the disturbed text sets corresponding to all the disturbance modes is the same.

In step 703A, text prediction processing is performed on each of the disturbed text sets and the text sample sets, so as to obtain text prediction results corresponding to each of the disturbed text sets and the text sample sets.

For example, the text prediction processing may be performed on the combination of the disturbed texts and the text sample sets by using the named entity model to be detected or by using other trained text recognition models, so as to obtain text prediction results corresponding to each disturbed text set and each text sample set. In the embodiment of the present application, the text prediction result is a model prediction score as an example.

In step 704A, a disturbance effect index of each disturbance mode is determined based on a text prediction result corresponding to each disturbance text set and each text sample set, and at least one disturbance mode of the head is selected from a descending order sorting result of the disturbance effect indexes of each disturbance mode.

In an example, the higher the disturbance effect index is, the higher the disturbance effect corresponding to the disturbance mode is, that is, when the disturbance processing is performed based on the disturbance mode, more noise can be added to the sample text, and the disturbance mode has higher interference to the recognition process of the named entity model.

Continuing to explain based on the above example, the disturbance effect indexes of the 30 disturbance modes are sorted in a descending order to obtain a result of sorting in a descending order, and 5 disturbance modes can be selected from the head of sorting in a descending order.

For example, the obtaining of the disturbance effect index of each disturbance mode in step 704A may be implemented by: the following processing is performed for each perturbation mode: and determining a prediction difference value corresponding to each disturbance text set corresponding to the disturbance mode, and determining the ratio of the sum of the prediction difference values corresponding to each disturbance text set to the number of the disturbance text sets corresponding to the disturbance mode as a corresponding disturbance effect index of each disturbance mode.

Here, the prediction difference is a difference between a prediction result corresponding to each of the disturbed text sets and a prediction result corresponding to the text sample set.

For example, if the disturbance effect indicator is defined as D, then D = Sum (sf (x ') -sf (x))/K, where K is the number of disturbance text sets corresponding to the disturbance mode, sf (x') is a model prediction score corresponding to one of the disturbance text sets, sf (x) is a model prediction score corresponding to a sample text set, sf (x ') -sf (x)) is a prediction difference value, and Sum (sf (x') -sf (x))/x) is the Sum of prediction difference values.

In step 103, a named entity recognition model is called to perform recognition processing on the text sample sets and the first number of evaluation text sets respectively, so as to obtain a first recognition result corresponding to each evaluation text set and a second recognition result corresponding to the text sample set.

For example, different identification processes may be adopted for different dimensional test modes and different perturbation modes.

In some embodiments, step 103 may be implemented by: calling a named entity recognition model to recognize each perturbation text in each evaluation text set to obtain a recognition result corresponding to each perturbation text, and combining the recognition results of each perturbation text to obtain a first recognition result of each evaluation text set; and calling a named entity recognition model to perform recognition processing on each text sample in the text sample set to obtain a recognition result corresponding to each text sample, and combining the recognition results of each text sample to obtain a second recognition result of the text sample set.

For example, when the first number is larger (e.g., greater than 10000), a large number of test results can be obtained by performing recognition once for each test text set. Therefore, the robustness of the named entity model is determined based on a large number of detection results, and a good detection effect is obtained.

In some embodiments, when the type of the perturbation mode is an entity perturbation mode, step 103 may be implemented by: and performing the following processing on each perturbation text in each evaluation text set: calling a named entity recognition model to perform recognition processing on the disturbed text for multiple times; obtaining a plurality of recognition results corresponding to the disturbance text, and combining the plurality of recognition results of the disturbance text to obtain a first recognition result of the evaluation text set; and calling a named entity recognition model to perform recognition processing on each text sample in the text sample set to obtain a recognition result corresponding to each text sample, and combining the recognition results of each text sample to obtain a second recognition result of the text sample set.

For example, when the first number is smaller (for example, the first number is 2), a large number of detection results may be obtained by performing multiple times of identification on the perturbation text, so that the robustness of the named entity model is determined based on the large number of detection results, and a good detection effect is obtained.

In step 104, the probability that the first recognition result is the same as the second recognition result is determined, and the recognition success rate of the named entity recognition model is determined based on the probability.

In an example, the recognition is successful, that is, the first recognition result is the same as the second recognition result, and in the case of more recognition results, the recognition success rate can be determined in a different dimensional manner.

In some embodiments, the recognition success rate may be obtained in the set dimension. Correspondingly, the first recognition result of each evaluation text set comprises: identifying results corresponding to each perturbation text in each evaluation text set; the second recognition result includes: and the recognition result corresponding to each text sample in the text sample set.

When at least one perturbation mode is a plurality of perturbation modes, referring to fig. 7B, fig. 7B is a sixth flowchart illustrating a testing method for a named entity recognition model provided in the embodiment of the present application, and step 104 may be implemented through steps 701B to 702B, which are described in detail below.

In step 701B, the recognition result of each text sample is compared with the recognition result of the perturbation text corresponding to each text sample, so as to obtain the same number of recognition results.

In the example, the recognition result is the same, that is, the named entity model is the same as the named entity obtained by recognizing the text sample for the named entity obtained by recognizing the perturbed text. Under the condition that the computing resources of the device corresponding to the named entity model are enough, the identification of each text sample and each perturbation text can be performed in parallel, so that the testing efficiency is improved.

In step 702B, the ratio of the number of the same recognition results to the total number of the texts corresponding to the first number of the evaluation text sets is determined as the probability that the first recognition result is the same as the second recognition result, and the probability is used as the recognition success rate of the named entity recognition model.

For example, the texts in each evaluation text set correspond to the texts in the text sample set one by one, the first number may be multiplied by the number of the texts in the text sample set to obtain the total number of the texts corresponding to the evaluation text set of the first number, and the ratio of the number of the same recognition results to the total number of the texts is taken as the recognition success rate.

In some embodiments, the identification success rate may be obtained in the sample dimension. Correspondingly, the first recognition result of each evaluation text set comprises: a plurality of identification results corresponding to each perturbation text in each evaluation text set; the second recognition result includes: identifying results corresponding to each text sample in the text sample set;

when the type of the perturbation mode is an entity perturbation mode, referring to fig. 7C, fig. 7C is a seventh flowchart schematic diagram of the testing method for the named entity recognition model provided in the embodiment of the present application, and step 104 may be implemented through steps 701C to 702C, which are described in detail below.

In step 701C, the following processing is performed for each of the perturbed texts: and comparing the plurality of recognition results of the disturbance text with the recognition results of the text samples corresponding to the disturbance text, and determining that the recognition is successful when at least one of the plurality of recognition results of the disturbance text is the same as the recognition result of the text sample corresponding to the disturbance text.

For example, the manner from step 701C to step 702C may be applied to the case that the evaluation text set is few, assuming that there are 1 evaluation text sets, performing recognition for each perturbation text in the evaluation text set for at most K times (for example, K = 10), and when there is a case that the recognition result of a perturbation text corresponding to a text sample is the same in the recognition process, stopping recognizing the perturbation text, and counting the success of one recognition. Through the identification mode and the statistical mode, the time and the identification times required by the identification process can be saved, and the overall efficiency of the test is further improved.

In step 702C, the ratio of the number of recognition success times corresponding to each evaluation text set to the number of sample texts in the sample text set is determined as the probability that the first recognition result is the same as the second recognition result, and the probability is used as the recognition success rate of the named entity recognition model.

For example, the number of sample texts in the sample text set is known, and the ratio of the number of recognition success times to the number of sample texts is used as the recognition success rate. And assuming that the named entity model is identified aiming at the disturbance text and is used as the confrontation of the disturbance text aiming at the named entity model, the identification success rate can also be used as the defense success rate of the named entity model, and assuming that the identification success rate is P, the confrontation success rate corresponding to the evaluation text set is (1-P).

In step 105, the robustness of the named entity recognition model is determined based on the recognition success rate.

Here, the recognition success rate is positively correlated with the robustness.

For example, training is performed on the same named entity model to obtain optimized named entity models of different versions, and the robustness of the models of different versions can be obtained by executing the test method of the named entity recognition model provided by the embodiment of the present application for multiple times on the models, so that the models of different versions are compared to obtain the named entity model with the best robustness. By the test method of the named entity recognition model, the test accuracy and effect can be effectively improved. A user (e.g., a technician) may train or refine the model based on the test results.

The method comprises the steps of disturbing a sample text set based on at least one disturbance mode to obtain at least one evaluation text set comprising disturbance texts, calling a named entity recognition model to respectively recognize the sample text set and the at least one evaluation text set, determining the probability that the output results of the named entity recognition model on the sample text set and the output results of the named entity recognition model on the evaluation text set are the same, and determining the robustness of the named entity recognition model based on the probability. The evaluation text set is generated in a disturbance mode, so that the workload required for marking the sample is reduced, the computing resource is saved, the testing efficiency is improved, and the testing accuracy is improved.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

The test method of the named entity recognition model provided by the embodiment of the application can be applied to the following application scenes, for a vehicle-mounted voice assistant which applies the named entity recognition model to perform voice recognition, the existing named entity recognition algorithm has a lot of robustness problems, which seriously affects the effect of natural language recognition, because the text expression mode of a real scene has a lot of changes, the voice assistant has voice recognition errors, the entity recognition algorithm is often recognized based on some noisy data, and the effect has great difference. Therefore, the named entity recognition model utilized by the vehicle-mounted voice assistant can be tested by the testing method of the named entity recognition model in the embodiment of the application, so as to determine whether the named entity recognition model meets the application requirement in a real scene.

Referring to fig. 8, fig. 8 is an eighth flowchart illustrating a testing method for a named entity recognition model according to an embodiment of the present application. The robustness result of the named entity recognition model, i.e. the detection result of robustness, is obtained through steps 801 to 806, which will be described in detail below.

In step 801, a set of text samples is obtained.

For example, an evaluation set with uniformly distributed data may be screened from an existing evaluation set, and a text sample sampling algorithm based on probability map clustering is used to collect a sample evaluation set, for example: text clustering is carried out through a Markov clustering algorithm, a sample evaluation set is clustered into a plurality of clusters, data of each cluster are sampled in a balanced mode, and the screened evaluation distribution is guaranteed to be more balanced. And after the text sample set is obtained, carrying out truth value labeling on the text sample set. The set of text samples may be labeled for different test task types (e.g., text classification, named entity recognition). For named entity identification, the entity names, the start of the entities, the Index (Index), and the type of the entities of all entities can be labeled, and the labeling allows for labeling nested entities. For example: hereinafter, a specific name of a bridge is referred to as "certain", and the text "where a certain bridge is" may be labeled in the following manner. { "text": where a certain bridge is, result _ list ": [" certain bridge ", 0," place class "] ] }

After step 801, a step 805 of identifying a set of text samples may be performed. Or calling the named entity recognition model and simultaneously recognizing the text sample set and the evaluation text set.

In step 802, text perturbation processing is performed.

In an example, the text sample set can be disturbed through various random disturbance modes, and various disturbance modes can be selected for pre-disturbing to improve the disturbance effect, so that the disturbance mode with a better disturbance effect is selected as the disturbance mode for testing.

For example: defining N (for example, 30) text disturbance modes, and performing fixed disturbance on the text sample set based on each disturbance mode to obtain a second number (for example, 100-. Calling a named entity identification model interface to be tested by the text sample set in an interface mode to obtain a model prediction score sf (x); calling a named entity recognition model interface for each disturbance text set to be screened in an interface mode to obtain a model score sf (x'); obtaining a second number of text sample sets by each disturbance mode, wherein the disturbance effect index is defined as D = Sum (sf (x') -sf (x))/K; and sequencing the N D, and selecting M (for example, 5) disturbance modes with the maximum disturbance effect index as the disturbance modes for testing. Based on M disturbance modes, each disturbance mode is subjected to disturbance processing to obtain a first number (for example: 10000) of evaluation sets, and the first number is far larger than the second number.

Illustratively, the perturbation process includes the following ways.

1. Adding a word based on a mask language model, and representing by randomly selecting the position of the added word in the original text and adding 1-3 masks after the word, for example: "do you happy today" is added through the mask to obtain "do you happy today [ mask ]", "do you happy today [ mask ] [ mask ]") "

2. Predicting Next Word (Next Word Predict) by a language Representation model (BERT), selecting the Word with the maximum prediction probability from the predicted words as a disturbed text, wherein the disturbed text comprises 'you are happy today' and 'you are happy today'

3. Synonym replacement: the method comprises the steps of representing words into a vector through a Global vector for Word representation (Glove) model or a Word vector generation model (Word 2vec, Word to vector) and other pre-training models represented by words, obtaining Euclidean distance between Vectors, and replacing words of an original text with words with shorter Euclidean distance. The method of selecting the alternative words may be in the form of a random selection. For example, the conversion of 'you are happy today' into 'you are happy today'

4. Adjacent word replacement: and replacing the words and the neighbor words, wherein the mode of selecting the words can adopt a random mode. For example, "do you happy today" changes to "do you happy today"

5. Numerical replacement, for example: "one month and one day" is converted to "1 month and 1 day".

6. Pruned characters, for example: "do you are happy today" is converted into "do you are happy very much" or "do you are happy today".

In step 803, the text quality is checked.

For example, the text quality of the perturbed text may be evaluated by the following parameters.

1. Disturbance rate (disturbance rate threshold can be set to 15%, when the disturbance rate of the text is greater than the disturbance rate threshold, the noise is too large, and the method is not suitable for testing): disturbance rate = (number of characters subjected to disturbance)/(number of original text characters).

2. Edit distance between original text and perturbation text: the edit distance can be characterized by the number of times of execution of an editing operation (perturbation process), and editing of one character as one editing operation, for example: the text "do you happy today" and the text "do you happy" are edited 1 time, and the edit distance is 1. The edit distance threshold may be set based on the length of the text, which is positively correlated with the edit distance threshold, for example: "Do you happy" text length is 5 characters, and the edit distance threshold may be set to 2.

3. Text confusion: the evaluation index of the language model is used for evaluating the naturalness of the disturbed text in the embodiment of the application, and if the disturbed text does not conform to the natural language expression, the fact that the noise added in the disturbance processing process is too large is indicated, and the evaluation is not suitable.

Suppose that a piece of text is characterized as

Correspondingly, the degree of text confusion

Can be expressed as the following formula (1).

Wherein,

is the ith word in the text, k is the number of words in the text,

is the k-th word in the text,

is the distribution of the probability that the ith word in the text may occur.

In step 804, a plurality of evaluation text sets are obtained.

Illustratively, comprehensive quality detection is carried out through disturbance rate, editing distance and confusion degree, reasonable noise added by a disturbed text is guaranteed, the disturbed text conforms to a natural language expression mode, when any parameter in the parameters is larger than a corresponding threshold value, the disturbed text does not conform to a test standard, an evaluation text set corresponding to the disturbed text is deleted, and the rest evaluation text set is reserved. And marking the rest evaluation text sets based on the quality evaluation result to obtain a plurality of evaluation text sets.

In step 805, the models are called to identify the evaluation text set and the text sample set respectively.

By way of example, the following "models" are short for named entity recognition models. And calling the model to be detected to obtain the identification result corresponding to each evaluation text set. And calling the model to be detected to obtain the recognition result corresponding to the text sample set. In order to improve the testing efficiency, when the evaluation text set is obtained, the identification result corresponding to the text sample set can be obtained.

In step 806, the recognition results are aligned.

By way of example, the lower the recognition success rate, the less robust the model is. The formula corresponding to the identification success rate is as follows: recognition success rate = Sum (f (x) = = f (x '))/Sum (x), x is original text, x' is perturbed text, and f is a label result predicted by the model. The recognition success rate is the ratio of the number of the actual label results of the original text samples to the number of the detection results corresponding to the disturbed text to the total number of the texts.

For robustness, a robustness index can be calculated, wherein the robustness can be understood as the anti-interference capability of a model to data noise, and the noise is the disturbance generated in the text disturbance process. Robustness indicator = Sum (sf (x') = = sf (x))/Sum (x), sf being the score predicted by the model. And the robustness index is the ratio of the number of the prediction scores of the original text predicted by the model and the prediction scores corresponding to the disturbed texts to the total number of the texts.

For example, in addition to evaluating the model through the set dimension, the model may also be evaluated through the sample dimension, referring to fig. 9, fig. 9 is a ninth flowchart schematic diagram of the testing method for the named entity recognition model provided in the embodiment of the present application. The robustness result of the named entity recognition model, that is, the detection result of robustness, is obtained through steps 901 to 908, which is described in detail below.

In step 901, a set of text samples is obtained.

Step 801 may be referred to for implementation of step 901, which is not described herein again.

In step 902, salient word screening.

For example, the text includes entity words and non-entity words, and the significant word refers to a word with higher importance than other non-entity words in the non-entity words. The importance of a word is judged by the effect of missing the word on the meaning of the text. The significance of non-entity words in text may be obtained by: the method comprises the steps of segmenting words of an original text, deleting the words one by one on the basis of a word segmentation result to obtain a plurality of candidate sentences (Candidates), and reasoning the original text and the candidate sentences through a white box model respectively to obtain a recognition result (Results) of a predicted named entity and a score corresponding to the recognition result (if the named entity is not recognized by the white box model, the score is 0). The difference value between the score of the original text and the score of the candidate sentence corresponding to the word is an importance index of the word, if the word is absent, the prediction of the white box model is not influenced or the score value influence is small, the word is not important, the difference value is positively correlated with the importance index, the words can be sorted according to the importance, and at least one word at the head in the descending sorting result is selected for disturbance.

The following illustrates, for example, the text "do you know which province a certain bridge is in", a certain refers to a name.

The word is "you, know, certain bridge, in, which, province, Domo".

Words in the text are deleted one by one to obtain a plurality of candidate sentences as follows.

Candidate sentence A "knows which bridge, place, province, or Do"

Candidate sentence B "you, certain bridge, in, which province, Do"

Candidate sentence C "you, know, in, which, province, Do"

Candidate sentence D "you, know, certain bridge, which, province, Do"

Candidate sentence E "you, know, certain bridge, in, province, Do"

Candidate sentence F "you, know, certain bridge, in, which, Do"

Candidate sentence G "you, know, certain bridge, which, province"

For example, the candidate sentence D lacks "in", but the meaning of the candidate sentence D is less influenced than that of the original text, and the importance of "in" is lower in the original text.

In step 903, the text perturbation process.

For example, the perturbation process for the non-entity word may refer to step 802, which is not described herein again. The perturbation processing for the entity word can be realized in the following manner.

1. Named entity replacement: the method comprises the steps of representing words into a vector through a Global vector for Word representation (Glove) model or a Word vector generation model (Word 2vec, Word to vector) and other pre-training models represented by words, obtaining Euclidean distance between Vectors, and replacing words of an original text with words with shorter Euclidean distance. For example, "do you know how many years of week" is converted into "do you know how many years of week", and named entity words are replaced by short names, alternative names, and the like.

2. Named entity perturbation: and disturbing the named entity identification in a word deleting and word adding mode. For example: "do you know which province a certain bridge is in" changes to "do you know which province a certain bridge is in", "do you know which province a certain bridge is in".

In step 904, the text quality is checked.

For example, in addition to the confusion, the text quality detection may be performed by the following parameters in a manner of disturbing to distinguish between the entity words and the non-entity words. The text data set for named entity recognition mostly adopts a labeling method of BIO or biees, such as "how many years you know of a week", labeled as "you/O know/O week/B certain/I many/O few/O year/O mo/O", where B denotes that the word is at the beginning of an entity (Begin), I denotes that the word is Inside the entity (Inside), O denotes that the word is Outside the entity (Outside), E denotes that the word is at the End position of an entity (End), and S denotes that the word itself may denote an entity (Single). In the embodiments of the present application, a BIO labeling method is taken as an example for explanation.

1. Entity disturbance rate: naming the proportion of entity perturbation, entity perturbation rate = (number of BI characters processed by perturbation)/(number of original text characters).

2. Non-entity disturbance rate: the proportion of disturbance of non-entity words, and the rate of disturbance of non-entity words = (number of O characters processed by disturbance)/(number of original text characters).

In step 905, a set of assessment texts is generated.

In the example, comprehensive quality detection is carried out through the entity disturbance rate, the non-entity disturbance rate and the confusion degree, the fact that noise added to a disturbed text is reasonable is guaranteed, the disturbed text conforms to a natural language expression mode, the disturbance effect is controlled, and an evaluation text set conforming to the test standard is obtained.

In step 906, the model is called to identify the text sample set by evaluating each perturbation text in the text sample set to resist against the model.

For example, assume: a text sample set A and an evaluation text set B. Inputting the text sample set A into a named entity recognition model to infer to obtain a result R, and competing against each sample text ai of the text sample set A by evaluating each perturbation text in the text set B, wherein the competing times of each perturbation text are set to be K. And inputting each perturbation text bi in the evaluation text set B into a named entity recognition model to be tested for inference to obtain a result Ri'.

In step 907, it is determined whether the countermeasure was successful or the number of countermeasures is greater than K. If yes, go to step 908.

For example, continuing from the above example, comparing each text sample with the output result of the perturbed text corresponding to the text sample, that is, the Ri' result with the Ri result, and determining whether the named entity recognition model is successfully confronted, where if the named entities recognized by the two results are the same, the confrontation fails. If not, the countermeasure is successful.

In step 908, the number of successful challenges is counted.

Illustratively, the number of times the countermeasure succeeds is counted, and the success rate of the countermeasure is calculated. Challenge success rate = number of bis/number of samples in the evaluation text set B that are successful in challenge, a lower challenge success rate representing a more robust named entity recognition model. For example, the confrontation recognition can be regarded as successful defense of the named entity recognition model, that is, the named entity recognition model is successfully recognized, and the defense success rate (recognition success rate) of the named entity recognition model is positively correlated with the robustness.

In the embodiment of the application, the disturbance is carried out in various disturbance modes, the diversity of the disturbance text is ensured, the reliability of the detection result is higher, the recognition effect of the named entity recognition model on different noises can be obtained, and the detection precision is improved. The processing effect of the named entity recognition model under different noise conditions can be determined, and the time and the efficiency of manually generating evaluation data are saved through automatic evaluation.

Continuing with the exemplary structure of the named entity recognition model testing device 455 provided by the embodiments of the present application as a software module, in some embodiments, as shown in fig. 2, the software modules stored in the named entity recognition model testing device 455 of the memory 440 may include: a sample obtaining module 4551, configured to obtain a text sample set and a named entity identification model to be tested; the sample perturbation module 4552 is configured to perform perturbation processing on the text sample set based on at least one perturbation mode to obtain a first number of evaluation text sets, where each perturbation mode corresponds to at least one evaluation text set; the model testing module 4553 is configured to invoke a named entity recognition model to perform recognition processing on the text sample sets and the first number of evaluation text sets, so as to obtain a first recognition result corresponding to each evaluation text set and a second recognition result corresponding to each text sample set; the model testing module 4553 is further configured to determine a probability that the first recognition result is the same as the second recognition result, and determine a recognition success rate of the named entity recognition model based on the probability; the model testing module 4553 is further configured to determine robustness of the named entity recognition model based on a recognition success rate, where the recognition success rate is positively correlated with the robustness.

In some embodiments, the sample perturbation module 4552 is configured to perform the following for each perturbation mode of the at least one perturbation mode: carrying out disturbance processing on the text sample set based on the disturbance mode to obtain at least one evaluation text set corresponding to the disturbance mode; and performing text quality detection on at least one evaluation text set corresponding to each disturbance mode, and determining a first number of evaluation text sets based on the obtained text quality detection result.

In some embodiments, the types of perturbation modes include: a random disturbance mode; when the type of the disturbance mode is a random disturbance mode, the sample disturbance module 4552 is configured to perform the following processing on the text sample set at least once to obtain at least one evaluation text set: performing the following processing on each text sample in the text sample set: selecting at least one target word from the text sample, and performing disturbance processing on each target word to obtain a disturbance text; and combining each perturbation text into an evaluation text set, wherein each perturbation text in the evaluation text set corresponds to each text sample in the text sample set one by one.

In some embodiments, the types of perturbation modes include: an entity perturbation mode; when the disturbance mode is an entity disturbance mode, the sample disturbance module 4552 is configured to perform the following processing on the text sample set at least once to obtain at least one evaluation text set: performing the following processing on each text sample in the text sample set: acquiring entity words and non-entity words in a text sample, selecting at least one target word of a head from a descending ordering result of the importance index of each non-entity word to perform perturbation processing, and performing perturbation processing on each entity word based on the word meaning of each entity word to obtain a perturbed text; and combining each perturbation text into an evaluation text set, wherein each perturbation text in the evaluation text set corresponds to each text sample in the text sample set one by one.

In some embodiments, the sample perturbation module 4552 is configured to perform word segmentation processing on the text sample to obtain a word segmentation result of the text sample; for each non-entity word, the following is performed: deleting the non-entity words from the word segmentation result of the text sample to obtain candidate sentences corresponding to the non-entity words; conducting named entity recognition on each candidate sentence and the text sample to obtain a recognition result of each candidate sentence and a recognition result of the text sample; and acquiring a difference value between the recognition result of each candidate sentence and the recognition result of the text sample, and determining the importance index of the non-entity word corresponding to the candidate sentence based on the difference value, wherein the difference value is positively correlated with the importance index of the non-entity word in the text sample.

In some embodiments, the sample perturbation module 4552 is configured to perform at least one of the following processing on each entity word: performing character increase and decrease on the entity words based on the word senses of the entity words; and acquiring synonyms of the entity words, and replacing the entity words with the synonyms.

In some embodiments, the sample perturbation module 4552 is configured to perform perturbation processing on each target word, and includes at least one of the following: adding a mask between the target word and the neighbor word of the target word; predicting the next word of the target word, and adding the predicted word to the target word; replacing the target word with a synonym of the target word; exchanging positions of the target word and the neighbor words of the target word in the text sample; increasing or decreasing characters in the target word; and replacing the target word with other corresponding characters.

In some embodiments, when the type of the perturbation mode is an entity perturbation mode, the sample perturbation module 4552 is configured to determine a perturbation parameter of each perturbation text in each evaluation text set, where the perturbation parameter includes at least one of: entity disturbance rate, non-entity disturbance rate and text confusion degree; when the type of the perturbation mode is a random perturbation mode, the sample perturbation module 4552 is configured to determine a perturbation parameter of each perturbation text in each evaluation text set, where the perturbation parameter includes at least one of the following: text disturbance rate, text editing distance and text confusion degree; the sample disturbance module 4552 is configured to delete the evaluation text set corresponding to the disturbance text whose disturbance parameter is greater than the disturbance parameter threshold; and counting the number of the undeleted evaluation text sets in at least one evaluation text set corresponding to each disturbance mode, and labeling each disturbance text in the undeleted evaluation text sets based on the disturbance parameters corresponding to each disturbance text to obtain a first number of evaluation text sets.

In some embodiments, the sample perturbation module 4552 is configured to obtain a plurality of perturbation modes to be screened; performing disturbance processing on the text sample set based on a plurality of disturbance modes to obtain a second number of disturbance text sets, wherein each disturbance mode corresponds to at least one disturbance text set; respectively performing text prediction processing on each disturbance text set and each text sample set to obtain text prediction results respectively corresponding to each disturbance text set and each text sample set; and determining a disturbance effect index of each disturbance mode based on the text prediction result corresponding to each disturbance text set and each text sample set, and selecting at least one disturbance mode at the head from the descending order result of the disturbance effect indexes of each disturbance mode.

In some embodiments, the model test model 4553 is configured to perform the following for each perturbation mode: determining a prediction difference value corresponding to each perturbation text set corresponding to the perturbation mode, wherein the prediction difference value is a difference value between a prediction result corresponding to each perturbation text set and a prediction result corresponding to the text sample set; and determining the ratio of the sum of the prediction difference values corresponding to each disturbance text set to the number of the disturbance text sets corresponding to the disturbance modes as a corresponding disturbance effect index of each disturbance mode.

In some embodiments, the model test model 4553 is configured to invoke a named entity recognition model to perform recognition processing on each perturbation text in each evaluation text set to obtain a recognition result corresponding to each perturbation text, and combine the recognition results of each perturbation text to obtain a first recognition result of each evaluation text set; and calling a named entity recognition model to perform recognition processing on each text sample in the text sample set to obtain a recognition result corresponding to each text sample, and combining the recognition results of each text sample to obtain a second recognition result of the text sample set.

In some embodiments, the first recognition result of each evaluation text set includes: identifying results corresponding to each perturbation text in each evaluation text set; the second recognition result includes: identifying results corresponding to each text sample in the text sample set; when at least one of the disturbance modes is a plurality of disturbance modes, the model test model 4553 is used for comparing the recognition result of each text sample with the recognition result of the disturbance text corresponding to each text sample to obtain the same number of recognition results; and determining the ratio of the number of the same identification results to the total number of the texts corresponding to the evaluation text set of the first number as the probability of the first identification result being the same as the second identification result, and taking the probability as the identification success rate of the named entity identification model.

In some embodiments, when the type of the perturbation mode is an entity perturbation mode, the model test model 4553 is configured to perform the following processing for each perturbation text in each evaluation text set: calling a named entity recognition model to perform recognition processing on the disturbed text for multiple times; obtaining a plurality of recognition results corresponding to the disturbance text, and combining the plurality of recognition results of the disturbance text to obtain a first recognition result of the evaluation text set; and calling a named entity recognition model to perform recognition processing on each text sample in the text sample set to obtain a recognition result corresponding to each text sample, and combining the recognition results of each text sample to obtain a second recognition result of the text sample set.

In some embodiments, the first recognition result of each evaluation text set includes: a plurality of identification results corresponding to each perturbation text in each evaluation text set; the second recognition result includes: identifying results corresponding to each text sample in the text sample set; when the type of the perturbation mode is an entity perturbation mode, the model test model 4553 is configured to perform the following processing for each perturbation text: comparing the multiple recognition results of the disturbance text with the recognition results of the text samples corresponding to the disturbance text, and determining that the recognition is successful when at least one of the multiple recognition results of the disturbance text is the same as the recognition result of the text sample corresponding to the disturbance text; and determining the ratio of the identification success times corresponding to each evaluation text set to the number of sample texts in the sample text set as the probability that the first identification result is the same as the second identification result, and taking the probability as the identification success rate of the named entity identification model.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of the computer device from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the test method of the named entity recognition model described in the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions, which when executed by a processor, will cause the processor to perform a method for testing a named entity recognition model provided by embodiments of the present application, for example, the method for testing a named entity recognition model shown in fig. 3.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, the sample text set is disturbed based on at least one disturbance mode through the embodiment of the application to obtain at least one evaluation text set including disturbed texts, the named entity recognition model is called to respectively recognize the sample text set and the at least one evaluation text set, the probability that the output results of the named entity recognition model on the sample text set and the output results of the evaluation text set are the same is determined, and the robustness of the named entity recognition model is determined based on the probability. The evaluation text set is generated in a disturbance mode, so that the workload required for marking the sample is reduced, the computing resources are saved, and the testing efficiency is improved.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method for testing a named entity recognition model, the method comprising:

determining robustness of the named entity recognition model based on the recognition success rate, wherein the recognition success rate is positively correlated with the robustness;

wherein, when the at least one perturbation mode only comprises an entity perturbation mode, the first identification result comprises: a plurality of identification results corresponding to each perturbation text in each evaluation text set; the second recognition result includes: the recognition result corresponding to each text sample in the text sample set;

when at least one of the identification results of the disturbed text is the same as the identification result of the text sample corresponding to the disturbed text, determining that the disturbed text is successfully identified;

and determining the ratio of the identification success frequency corresponding to each evaluation text set to the number of the text samples in the text sample set as the probability that the first identification result is the same as the second identification result, and determining the probability as the identification success rate of the named entity identification model in the sample dimension.

2. The method of claim 1, wherein the perturbing the set of text samples based on at least one perturbation mode to obtain a first number of sets of evaluation texts comprises:

for each perturbation mode in the at least one perturbation mode, performing the following processing: performing disturbance processing on the text sample set based on the disturbance mode to obtain at least one evaluation text set corresponding to the disturbance mode;

and performing text quality detection on the at least one evaluation text set corresponding to each disturbance mode, and determining a first number of evaluation text sets based on the obtained text quality detection result.

3. The method of claim 2, wherein the type of perturbation mode comprises: a random disturbance mode;

when the type of the disturbance mode is the random disturbance mode, the disturbance processing is performed on the text sample set based on the disturbance mode to obtain at least one evaluation text set corresponding to the disturbance mode, and the method includes:

performing at least one time of the following processing on the text sample set to obtain at least one evaluation text set:

performing the following processing on each text sample in the text sample set: selecting at least one target word from the text sample, and performing disturbance processing on each target word to obtain a disturbance text;

and combining each perturbation text into an evaluation text set, wherein each perturbation text in the evaluation text set corresponds to each text sample in the text sample set in a one-to-one mode.

4. The method according to claim 2, wherein when the disturbance mode is the entity disturbance mode, the disturbing processing is performed on the text sample set based on the disturbance mode to obtain at least one evaluation text set corresponding to the disturbance mode, and the method includes:

performing the following processing on the text sample set at least once to obtain at least one evaluation text set:

performing the following on each of the text samples in the set of text samples: obtaining entity words and non-entity words in the text sample, selecting at least one target word of the head from the descending order sorting result of the importance index of each non-entity word for disturbance processing, and

disturbing each entity word based on the word meaning of each entity word to obtain a disturbed text;

5. The method of claim 4, wherein before the selecting at least one target word of the head from the descending order of the importance indicator of each non-entity word for perturbation processing, the method further comprises:

performing word segmentation processing on the text sample to obtain a word segmentation result of the text sample;

for each of the non-entity words, performing the following: deleting the non-entity words from the word segmentation results of the text samples to obtain candidate sentences corresponding to the non-entity words;

conducting named entity recognition on each candidate sentence and the text sample to obtain a recognition result of each candidate sentence and a recognition result of the text sample;

and obtaining a difference value between the recognition result of each candidate sentence and the recognition result of the text sample, and determining the importance index of the non-entity word corresponding to the candidate sentence based on the difference value, wherein the difference value is positively correlated with the importance index of the non-entity word in the text sample.

6. The method of claim 4, wherein said perturbing each of said entity words based on its word senses comprises:

performing at least one of the following processes on each of the entity words:

performing character addition and subtraction on the entity words based on the word senses of the entity words;

and obtaining synonyms of the entity words, and replacing the entity words with the synonyms.

7. The method of claim 3 or 4, wherein the perturbation processing for each of the target words comprises at least one of:

adding a mask between the target word and a neighboring word of the target word;

predicting the next word of the target word, and adding the predicted word to the target word;

replacing the target word with a synonym of the target word;

exchanging positions of the target word and the neighbor words of the target word in the text sample;

increasing or decreasing characters in the target words;

and replacing the target word with other corresponding characters.

8. The method according to claim 2, wherein the performing the text quality detection on the at least one evaluation text set corresponding to each of the perturbation modes, and determining a first number of evaluation text sets based on the obtained text quality detection result comprises:

when the type of the disturbance mode is an entity disturbance mode, determining a disturbance parameter of each disturbance text in each evaluation text set, wherein the disturbance parameters comprise at least one of the following items: entity disturbance rate, non-entity disturbance rate and text confusion degree;

when the type of the disturbance mode is a random disturbance mode, determining a disturbance parameter of each disturbance text in each evaluation text set, wherein the disturbance parameters comprise at least one of the following items: text disturbance rate, text editing distance and text confusion degree;

deleting the evaluation text set corresponding to the disturbance text with the disturbance parameter larger than the disturbance parameter threshold value;

and counting the number of the undeleted evaluation text sets in the at least one evaluation text set corresponding to each disturbance mode, and labeling each disturbance text in the undeleted evaluation text sets based on the disturbance parameters corresponding to each disturbance text to obtain a first number of evaluation text sets.

9. The method of claim 1, wherein before perturbing the set of text samples based on at least one perturbation mode to obtain the first number of sets of test texts, the method further comprises:

acquiring a plurality of disturbance modes to be screened;

performing disturbance processing on the text sample set based on the multiple disturbance modes to obtain a second number of disturbance text sets, wherein each disturbance mode corresponds to at least one disturbance text set;

respectively performing text prediction processing on each disturbance text set and each text sample set to obtain text prediction results respectively corresponding to each disturbance text set and each text sample set;

and determining a disturbance effect index of each disturbance mode based on a text prediction result corresponding to each disturbance text set and each text sample set, and selecting at least one disturbance mode of the head from a descending order sorting result of the disturbance effect indexes of each disturbance mode.

10. The method as claimed in claim 9, wherein the determining a disturbance effect indicator for each of the disturbance modes based on the text prediction result corresponding to each of the disturbance text sets and the text sample sets respectively comprises:

for each of the perturbation modes, performing the following:

determining a prediction difference value corresponding to each perturbation text set corresponding to the perturbation mode, wherein the prediction difference value is a difference value between a prediction result corresponding to each perturbation text set and a prediction result corresponding to the text sample set;

and determining the ratio of the sum of the prediction difference values corresponding to each perturbation text set to the number of the perturbation text sets corresponding to the perturbation modes as a corresponding perturbation effect index of each perturbation mode.

11. The method of claim 1,

when the at least one disturbance mode comprises a plurality of disturbance modes, the first identification result comprises: identifying results corresponding to each perturbation text in each evaluation text set; the second recognition result includes: the recognition result corresponding to each text sample in the text sample set;

the method further comprises the following steps:

comparing the recognition result of each text sample with the recognition result of the disturbance text corresponding to each text sample to obtain the same number of recognition results;

and determining the ratio of the number of the same recognition results to the total number of texts corresponding to the first number of evaluation text sets as the probability of the same first recognition results and the second recognition results, and taking the probability as the recognition success rate of the named entity recognition model in the set dimension.

12. A test apparatus for a named entity recognition model, the test apparatus comprising:

the model testing module is further configured to determine robustness of the named entity recognition model based on the recognition success rate, wherein the recognition success rate is positively correlated with the robustness;

the model testing module is further used for determining that the disturbance text identification is successful when at least one of the identification results of the disturbance text is the same as the identification result of the text sample corresponding to the disturbance text; and determining the ratio of the identification success frequency corresponding to each evaluation text set to the number of text samples in the text sample set as the probability that the first identification result is the same as the second identification result, and determining the probability as the identification success rate of the named entity identification model in the sample dimension.

13. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing a method of testing a named entity recognition model according to any one of claims 1 to 11 when executing executable instructions stored in the memory.

14. A computer-readable storage medium storing executable instructions, wherein the executable instructions, when executed by a processor, implement a method for testing a named entity recognition model according to any one of claims 1 to 11.