CN114138972A - Text type identification method and device - Google Patents

Text type identification method and device Download PDF

Info

Publication number
CN114138972A
CN114138972A CN202111440947.0A CN202111440947A CN114138972A CN 114138972 A CN114138972 A CN 114138972A CN 202111440947 A CN202111440947 A CN 202111440947A CN 114138972 A CN114138972 A CN 114138972A
Authority
CN
China
Prior art keywords
category
text
labeled
standard
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111440947.0A
Other languages
Chinese (zh)
Other versions
CN114138972B (en
Inventor
武文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xumi Yuntu Space Technology Co Ltd
Original Assignee
Shenzhen Jizhi Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jizhi Digital Technology Co Ltd filed Critical Shenzhen Jizhi Digital Technology Co Ltd
Priority to CN202111440947.0A priority Critical patent/CN114138972B/en
Publication of CN114138972A publication Critical patent/CN114138972A/en
Application granted granted Critical
Publication of CN114138972B publication Critical patent/CN114138972B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to the technical field of artificial intelligence, and provides a text type identification method and device. The method comprises the following steps: in the text to be annotated of each category, determining a standard subcategory corresponding to the text to be annotated of each category according to the standard questions and the first similarity of each similar question; in the text to be labeled of each category, determining a nonstandard sub-category corresponding to the text to be labeled of each category according to the second similarity between any one similar question and each of the other similar questions; determining a category set according to a standard sub-category and a plurality of non-standard sub-categories corresponding to the text to be labeled of each category; when a second text to be labeled is detected, updating the category set according to the second text to be labeled; and when the text to be recognized is detected, determining the category corresponding to the text to be recognized from the category set by using a nearest neighbor algorithm.

Description

Text type identification method and device
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to a text type identification method and device.
Background
In the identification of text categories, the prior art usually marks the text at one time, and performs text identification according to the marked text. However, in some text recognition scenarios, it is necessary to perform recognition operations of text categories for multiple times, or to update labeled text for multiple times, so as to ensure that the accuracy of text recognition is not lower than expected. In view of the above situation, the prior art has not yet solved.
In the course of implementing the disclosed concept, the inventors found that there are at least the following technical problems in the related art: the labeled text cannot be updated in real time, so that the accuracy rate of identifying the text category is low.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide a method and an apparatus for identifying a text category, an electronic device, and a computer-readable storage medium, so as to solve the problem in the prior art that the accuracy of identifying a text category is low because a labeled text cannot be updated in real time.
In a first aspect of the embodiments of the present disclosure, a method for recognizing a text category is provided, including: acquiring a first text to be annotated, wherein the first text to be annotated comprises a plurality of categories of texts to be annotated, and the text to be annotated of each category comprises a standard question and a plurality of similar questions; in the text to be annotated of each category, determining a standard subcategory corresponding to the text to be annotated of each category according to the standard questions and the first similarity of each similar question; in the text to be labeled of each category, determining a nonstandard sub-category corresponding to the text to be labeled of each category according to the second similarity between any one similar question and each of the other similar questions; determining a category set according to a standard sub-category and a plurality of non-standard sub-categories corresponding to the text to be labeled of each category; when a second text to be labeled is detected, updating the category set according to the second text to be labeled; and when the text to be recognized is detected, determining the category corresponding to the text to be recognized from the category set by using a nearest neighbor algorithm.
In a second aspect of the embodiments of the present disclosure, there is provided an apparatus for recognizing a text category, including: the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is configured to acquire a first text to be annotated, the first text to be annotated comprises a plurality of categories of texts to be annotated, and each category of texts to be annotated comprises a standard question and a plurality of similar questions; the first determining module is configured to determine a standard subcategory corresponding to the text to be labeled of each category according to the standard questions and the first similarity of each similar question in the text to be labeled of each category; the second determining module is configured to determine a nonstandard sub-category corresponding to the text to be labeled of each category according to the second similarity between any one similar question and each of the other similar questions in the text to be labeled of each category; the third determining module is configured to determine a category set according to the standard sub-category and the plurality of non-standard sub-categories corresponding to the text to be labeled of each category; the updating module is configured to update the category set according to the second text to be labeled when the second text to be labeled is detected; and the recognition module is configured to determine a category corresponding to the text to be recognized from the category set by using a nearest neighbor algorithm when the text to be recognized is detected.
In a third aspect of the embodiments of the present disclosure, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the above method when executing the computer program.
In a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, which stores a computer program, which when executed by a processor, implements the steps of the above-mentioned method.
Compared with the prior art, the embodiment of the disclosure has the following beneficial effects: in the text to be labeled of each category, determining a standard subcategory corresponding to the text to be labeled of each category according to the standard questions and the first similarity of each similar question; in the text to be labeled of each category, determining a nonstandard sub-category corresponding to the text to be labeled of each category according to the second similarity between any one similar question and each of the other similar questions; determining a category set according to a standard sub-category and a plurality of non-standard sub-categories corresponding to the text to be labeled of each category; when a second text to be labeled is detected, updating the category set according to the second text to be labeled; when the text to be recognized is detected, the closest algorithm is used for determining the category corresponding to the text to be recognized from the category set, so that the technical means can solve the problem that the accuracy of text category recognition is low due to the fact that the labeled text cannot be updated in real time in the prior art, and further the method can update the labeled text in real time and further recognize the text category.
Drawings
To more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive efforts.
FIG. 1 is a scenario diagram of an application scenario of an embodiment of the present disclosure;
fig. 2 is a flowchart illustrating a text category identification method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an apparatus for recognizing text categories according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.
A text category identification method and apparatus according to an embodiment of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a scene schematic diagram of an application scenario of an embodiment of the present disclosure. The application scenario may include terminal devices 1, 2, and 3, server 4, and network 5.
The terminal devices 1, 2, and 3 may be hardware or software. When the terminal devices 1, 2 and 3 are hardware, they may be various electronic devices having a display screen and supporting communication with the server 4, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like; when the terminal devices 1, 2, and 3 are software, they may be installed in the electronic devices as above. The terminal devices 1, 2 and 3 may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited by the embodiments of the present disclosure. Further, the terminal devices 1, 2, and 3 may have various applications installed thereon, such as a data processing application, an instant messaging tool, social platform software, a search-type application, a shopping-type application, and the like.
The server 4 may be a server providing various services, for example, a backend server receiving a request sent by a terminal device establishing a communication connection with the server, and the backend server may receive and analyze the request sent by the terminal device and generate a processing result. The server 4 may be one server, may also be a server cluster composed of a plurality of servers, or may also be a cloud computing service center, which is not limited in this disclosure.
The server 4 may be hardware or software. When the server 4 is hardware, it may be various electronic devices that provide various services to the terminal devices 1, 2, and 3. When the server 4 is software, it may be a plurality of software or software modules providing various services for the terminal devices 1, 2, and 3, or may be a single software or software module providing various services for the terminal devices 1, 2, and 3, which is not limited by the embodiment of the present disclosure.
The network 5 may be a wired network connected by a coaxial cable, a twisted pair and an optical fiber, or may be a wireless network that can interconnect various Communication devices without wiring, for example, Bluetooth (Bluetooth), Near Field Communication (NFC), Infrared (Infrared), and the like, which is not limited in the embodiment of the present disclosure.
A user can establish a communication connection with the server 4 via the network 5 through the terminal devices 1, 2, and 3 to receive or transmit information or the like. It should be noted that the specific types, numbers and combinations of the terminal devices 1, 2 and 3, the server 4 and the network 5 may be adjusted according to the actual requirements of the application scenarios, and the embodiment of the present disclosure does not limit this.
Fig. 2 is a flowchart illustrating a text category identification method according to an embodiment of the present disclosure. The recognition method of the text category of fig. 2 may be performed by the server of fig. 1. As shown in fig. 2, the method for identifying a text category includes:
s201, acquiring a first text to be annotated, wherein the first text to be annotated comprises a plurality of categories of texts to be annotated, and the text to be annotated of each category comprises a standard question and a plurality of similar questions;
s202, determining a standard sub-category corresponding to the text to be labeled of each category according to the standard questions and the first similarity of each similar question in the text to be labeled of each category;
s203, in the text to be labeled of each category, determining a nonstandard sub-category corresponding to the text to be labeled of each category according to the second similarity between any one similar question and each of the other similar questions;
s204, determining a category set according to the standard subcategory and the plurality of non-standard subcategories corresponding to the text to be labeled of each category;
s205, when a second text to be labeled is detected, updating a category set according to the second text to be labeled;
s206, when the text to be recognized is detected, determining the category corresponding to the text to be recognized from the category set by using a nearest neighbor algorithm.
It should be noted that, determining the standard sub-category corresponding to the text to be labeled of each category, determining the non-standard sub-category corresponding to the text to be labeled of each category, and determining the category set may be understood as performing labeling processing on the first text to be labeled, and belongs to the identification of the first text category. Since the labeling processing of the first text to be labeled is the recognition of the first text category, it is necessary to determine the order of the standard sub-category, the non-standard sub-category and the category set. The method comprises the steps of updating a category set according to a second text to be labeled, labeling the second text to be labeled, adding a result of labeling the second text to be labeled into the category set, and identifying the text which is not the first text category. Because the labeling processing of the second text to be labeled is the recognition of the text category not for the first time, the labeling processing can be performed sequentially without determining the standard sub-category, the non-standard sub-category and the category set, and only the comparison with the category set is needed. The similarity may be cosine similarity, text similarity, or the like.
According to the technical scheme provided by the embodiment of the disclosure, in the text to be labeled of each category, the standard subcategory corresponding to the text to be labeled of each category is determined according to the standard question and the first similarity of each similar question; in the text to be labeled of each category, determining a nonstandard sub-category corresponding to the text to be labeled of each category according to the second similarity between any one similar question and each of the other similar questions; determining a category set according to a standard sub-category and a plurality of non-standard sub-categories corresponding to the text to be labeled of each category; when a second text to be labeled is detected, updating the category set according to the second text to be labeled; when the text to be recognized is detected, the closest algorithm is used for determining the category corresponding to the text to be recognized from the category set, so that the technical means can solve the problem that the accuracy of text category recognition is low due to the fact that the labeled text cannot be updated in real time in the prior art, and further the method can update the labeled text in real time and further recognize the text category.
In step S202, in the text to be labeled of each category, according to the standard questions and the first similarity of each similar question, determining a standard subcategory corresponding to the text to be labeled of each category, including: in the text to be labeled of each category: calculating a first similarity between the standard questions and each similarity question; and when the first similarity is greater than a preset threshold, adding the similar questions corresponding to the first similarity greater than the preset threshold into the standard subcategory, and deleting the similar questions which are already added into the standard subcategory in the text to be annotated.
The standard question of the text to be labeled of a category can be understood as an average value of the text representation of the text to be labeled, or a most central part or data of the text representation of the text to be labeled, and the similarity question of the text to be labeled of a category can be understood as other parts or data in the text representation of the text to be labeled except the standard question. The standard question and the similar question of a text can firstly make the text pass through a text encoder to obtain a text representation of the text, the most central part or data of the text representation is used as the standard question of the text, and other parts or data are used as the similar question of the text.
The category set in the embodiment of the present disclosure includes one standard sub-category and multiple non-standard sub-categories of text to be labeled of each category, and actually represents the text to be labeled of each category as multiple vectors, and each sub-category corresponds to one vector. Because the embodiment of the present disclosure represents the text to be labeled of each category as a plurality of vectors, the text recognition according to the category set is more accurate.
In step S203, in the text to be labeled of each category, according to the second similarity between any one similar question and each of the other similar questions, determining a non-standard sub-category corresponding to the text to be labeled of each category, including: in the text to be labeled of each category: calculating a second similarity between any one of the similarity questions and each of the other similarity questions; and when the second similarity is larger than a preset threshold, adding the similar questions corresponding to the second similarity larger than the preset threshold into the non-standard subcategory corresponding to any one of the similar questions, and deleting the similar questions which are already added into the non-standard subcategory in the text to be labeled.
Because the standard sub-category is determined first and then the non-standard sub-category is determined when the text category is identified, the text to be annotated in the embodiment of the present disclosure is the text to be annotated after the similar questions that have been added to the standard sub-category are deleted. For example, in one category, after determining the standard subcategory, there are 10 similar questions in the text to be annotated. Calculating second similarity between the 1 st similar question and other 9 similar questions, wherein the second similarity between the 1 st similar question and the 5 th and 7 th similar questions is larger than a preset threshold value, so that the 5 th and 7 th similar questions are added to the non-standard subcategories corresponding to the 1 st similar question; then, a second similarity between the 2 nd similar question and the other 9 similar questions is calculated, and it is determined that the 2 nd similar question corresponds to the non-standard sub-category … … each of which corresponds to a non-standard sub-category. A non-standard sub-category may be understood as a queue, which is initially empty.
In step S204, determining a category set according to the standard sub-category and the multiple non-standard sub-categories corresponding to the text to be labeled of each category, including: in the standard subcategory and the plurality of non-standard subcategories corresponding to the text to be annotated of each category: calculating the arithmetic mean of all similar questions in each nonstandard sub-category to obtain the nonstandard sub-category representation corresponding to each nonstandard sub-category, calculating the arithmetic mean of the standard questions and all similar questions in the standard sub-category to obtain the standard sub-category representation corresponding to the standard sub-category, calculating the arithmetic mean of all nonstandard sub-category representations and the standard sub-category representation to obtain the mother category representation corresponding to the text to be labeled of each category; and determining a category set according to a plurality of non-standard sub-category representations, standard sub-category representations and mother category representations corresponding to the text to be labeled of each category.
The parent category representation corresponding to the text to be labeled of each category is the category representation of the category, each category is a parent category, so that the parent category is called parent category representation, a plurality of sub-categories are distinguished under each category, and the text representation of each sub-category is shown as a sub-category representation. In the embodiment of the present disclosure, the arithmetic mean of all similar questions in each nonstandard sub-category is calculated, and the arithmetic mean is used as the nonstandard sub-category corresponding to each nonstandard sub-category to represent, and may be obtained by first obtaining vectors corresponding to all similar questions in each nonstandard sub-category according to the related knowledge represented by the text, then calculating the arithmetic mean of the vectors corresponding to all similar questions in each nonstandard sub-category, and using the arithmetic mean as the nonstandard sub-category corresponding to each nonstandard sub-category to represent. Or calculating the weighted sum of all similar questions in each nonstandard sub-category, averaging the weighted sum, and representing the averaged value as the corresponding nonstandard sub-category of each nonstandard sub-category. The computation of the standard subcategory representation and the parent category representation is similar to the computation of the non-standard subcategory representation.
In step S205, when the second text to be labeled is detected, updating the category set according to the second text to be labeled includes: inputting the second text to be labeled into a text encoder to obtain a first text representation corresponding to the second text to be labeled; calculating a third similarity between the first text representation and each parent category representation in the category set, and when the third similarity is greater than a preset threshold: calculating fourth similarity of the first text representation and standard subcategory representations corresponding to each parent category representation in the category set, adding the second text to be labeled to the standard subcategory when the fourth similarity is larger than a preset threshold value, and updating the standard subcategory representations; when the fourth similarity is smaller than a preset threshold value, calculating fifth similarity of the first text representation and each nonstandard sub-category representation corresponding to each parent category representation in the category set, when the fifth similarity is larger than the preset threshold value, adding the second text to be labeled to the nonstandard sub-category corresponding to the fifth similarity larger than the preset threshold value, and updating the nonstandard sub-category representation corresponding to the fifth similarity larger than the preset threshold value; and when the third similarity is smaller than a preset threshold value, adding the first text representation as a new mother class representation into the class set.
The text encoder, which may be a BERT model, has been trained, learns and stores correspondences between the text to be annotated and the text representation.
The third similarity is greater than the preset threshold, which indicates that the second text to be labeled must belong to the parent category corresponding to the third similarity greater than the preset threshold, and therefore the second text to be labeled either belongs to the standard subcategory of the parent category or belongs to a non-standard subcategory of the parent category. Therefore, when the fourth similarity is smaller than the preset threshold, then the fifth similarity must be larger than the preset threshold. The text to be annotated, the parent category and the subcategory can be understood as text information, and the text representation, the subcategory representation and the parent category representation can be understood as vector information. And the third similarity is smaller than a preset threshold value, which indicates that the second text to be labeled does not belong to any parent category in the category set, so that the second text to be labeled is added to the category set as a new parent category. The set of categories comprises, or may be understood to comprise, respective parent categories, respective sub-categories, respective parent category representations and respective sub-category representations, each parent category representation being inclusive of the corresponding parent category and each sub-category representation being inclusive of the corresponding sub-category. And adding the second text to be annotated to the standard subcategory and updating the standard subcategory representation, wherein updating the standard subcategory representation means adding the first text representation to the standard subcategory representation. Further, the second text to be labeled can be added to the category set as a new sub-category of a new parent category, and when other texts to be labeled are detected subsequently, the new parent category is perfected or updated. If the similarity corresponding to the third text to be labeled and the second text to be labeled is greater than the preset threshold but smaller than the second preset threshold, the third text to be labeled can be used as the new parent category, except the second text to be labeled, and another new sub-category.
After step S205 is executed, when a second text to be labeled is detected, after the category set is updated according to the second text to be labeled, the method further includes: when a text recognition instruction is received, acquiring the text to be recognized and a category set; inputting the text to be recognized into a text encoder to obtain a second text representation corresponding to the text to be recognized; calculating a sixth similarity of the second textual representation to any one of the following category representations: each parent category representation in the category set, the standard sub-category representation corresponding to each parent category representation, and any one non-standard sub-category representation corresponding to each parent category representation; and when the sixth similarity is greater than the preset threshold, identifying the text to be recognized as the category corresponding to the sixth similarity greater than the preset threshold.
The text recognition method provided by the embodiment of the disclosure can recognize the parent category of the text to be recognized, and also can recognize the sub-category of the text to be recognized, wherein the sub-category includes: a non-standard sub-category and a standard sub-category. The preset threshold value appearing for multiple times in the present disclosure may be the same threshold value, for example, all of the threshold values are 0.7, or the preset threshold values appearing in different embodiments of the present disclosure may be different. And when the sixth similarity is greater than the preset threshold, identifying the text to be recognized as the category corresponding to the sixth similarity greater than the preset threshold. The category may be any parent category in the category set or any sub-category under any parent category.
In step S206, when the text to be recognized is detected, determining a category corresponding to the text to be recognized from the category set by using a nearest neighbor algorithm, including: inputting the text to be recognized into a text encoder to obtain a second text representation corresponding to the text to be recognized; and determining the category corresponding to the text to be recognized from the category set according to the second text representation by using a nearest neighbor algorithm.
For example, using a nearest neighbor algorithm, 50 sub-categories which are nearest to the second text representation are selected from each parent category in the category set, then the arithmetic mean of the 50 selected sub-categories of each parent category is respectively calculated, and the parent category which is nearest to the second text representation and has the arithmetic mean of each parent category is taken as the category corresponding to the text to be recognized.
In an optional embodiment, early warning can be performed on similar questions in the subclasses, so that improvement of data quality already marked in the class set is assisted. Specifically, the similarity between the standard subclass and the non-standard subclass and other similarity is smaller than a certain threshold, and the labeling processing or text representation is performed again.
All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.
Fig. 3 is a schematic diagram of an apparatus for recognizing a text category according to an embodiment of the present disclosure. As shown in fig. 3, the text category recognition device includes:
the obtaining module 301 is configured to obtain a first text to be annotated, where the first text to be annotated includes a plurality of categories of texts to be annotated, and each category of texts to be annotated includes a standard question and a plurality of similar questions;
the first determining module 302 is configured to determine a standard subcategory corresponding to the text to be labeled of each category according to the standard questions and the first similarity of each similar question in the text to be labeled of each category;
the second determining module 303 is configured to determine, in the text to be annotated in each category, a non-standard sub-category corresponding to the text to be annotated in each category according to a second similarity between any one similar question and each of the other similar questions;
a third determining module 304, configured to determine a category set according to the standard sub-category and the multiple non-standard sub-categories corresponding to the text to be annotated of each category;
an updating module 305, configured to, when detecting a second text to be labeled, update the category set according to the second text to be labeled;
the identifying module 306 is configured to determine, when the text to be identified is detected, a category corresponding to the text to be identified from the category set by using a nearest neighbor algorithm.
It should be noted that, determining the standard sub-category corresponding to the text to be labeled of each category, determining the non-standard sub-category corresponding to the text to be labeled of each category, and determining the category set may be understood as performing labeling processing on the first text to be labeled, and belongs to the identification of the first text category. Since the labeling processing of the first text to be labeled is the recognition of the first text category, it is necessary to determine the order of the standard sub-category, the non-standard sub-category and the category set. The method comprises the steps of updating a category set according to a second text to be labeled, labeling the second text to be labeled, adding a result of labeling the second text to be labeled into the category set, and identifying the text which is not the first text category. Because the labeling processing of the second text to be labeled is the recognition of the text category not for the first time, the labeling processing can be performed sequentially without determining the standard sub-category, the non-standard sub-category and the category set, and only the comparison with the category set is needed. The similarity may be cosine similarity, text similarity, or the like.
According to the technical scheme provided by the embodiment of the disclosure, in the text to be labeled of each category, the standard subcategory corresponding to the text to be labeled of each category is determined according to the standard question and the first similarity of each similar question; in the text to be labeled of each category, determining a nonstandard sub-category corresponding to the text to be labeled of each category according to the second similarity between any one similar question and each of the other similar questions; determining a category set according to a standard sub-category and a plurality of non-standard sub-categories corresponding to the text to be labeled of each category; when a second text to be labeled is detected, updating the category set according to the second text to be labeled; when the text to be recognized is detected, the closest algorithm is used for determining the category corresponding to the text to be recognized from the category set, so that the technical means can solve the problem that the accuracy of text category recognition is low due to the fact that the labeled text cannot be updated in real time in the prior art, and further the method can update the labeled text in real time and further recognize the text category.
Optionally, the first determining module 302 is further configured to, in the text to be annotated of each category: calculating a first similarity between the standard questions and each similarity question; and when the first similarity is greater than a preset threshold, adding the similar questions corresponding to the first similarity greater than the preset threshold into the standard subcategory, and deleting the similar questions which are already added into the standard subcategory in the text to be annotated.
The standard question of the text to be labeled of a category can be understood as an average value of the text representation of the text to be labeled, or a most central part or data of the text representation of the text to be labeled, and the similarity question of the text to be labeled of a category can be understood as other parts or data in the text representation of the text to be labeled except the standard question. The standard question and the similar question of a text can firstly make the text pass through a text encoder to obtain a text representation of the text, the most central part or data of the text representation is used as the standard question of the text, and other parts or data are used as the similar question of the text.
The category set in the embodiment of the present disclosure includes one standard sub-category and multiple non-standard sub-categories of text to be labeled of each category, and actually represents the text to be labeled of each category as multiple vectors, and each sub-category corresponds to one vector. Because the embodiment of the present disclosure represents the text to be labeled of each category as a plurality of vectors, the text recognition according to the category set is more accurate.
Optionally, the second determining module 303 is further configured to, in the text to be annotated of each category: calculating a second similarity between any one of the similarity questions and each of the other similarity questions; and when the second similarity is larger than a preset threshold, adding the similar questions corresponding to the second similarity larger than the preset threshold into the non-standard subcategory corresponding to any one of the similar questions, and deleting the similar questions which are already added into the non-standard subcategory in the text to be labeled.
Because the standard sub-category is determined first and then the non-standard sub-category is determined when the text category is identified, the text to be annotated in the embodiment of the present disclosure is the text to be annotated after the similar questions that have been added to the standard sub-category are deleted. For example, in one category, after determining the standard subcategory, there are 10 similar questions in the text to be annotated. Calculating second similarity between the 1 st similar question and other 9 similar questions, wherein the second similarity between the 1 st similar question and the 5 th and 7 th similar questions is larger than a preset threshold value, so that the 5 th and 7 th similar questions are added to the non-standard subcategories corresponding to the 1 st similar question; then, a second similarity between the 2 nd similar question and the other 9 similar questions is calculated, and it is determined that the 2 nd similar question corresponds to the non-standard sub-category … … each of which corresponds to a non-standard sub-category. A non-standard sub-category may be understood as a queue, which is initially empty.
Optionally, the second determining module 304 is further configured to, in the standard sub-category and the plurality of non-standard sub-categories corresponding to the text to be annotated of each category: calculating the arithmetic mean of all similar questions in each nonstandard sub-category to obtain the nonstandard sub-category representation corresponding to each nonstandard sub-category, calculating the arithmetic mean of the standard questions and all similar questions in the standard sub-category to obtain the standard sub-category representation corresponding to the standard sub-category, calculating the arithmetic mean of all nonstandard sub-category representations and the standard sub-category representation to obtain the mother category representation corresponding to the text to be labeled of each category; and determining a category set according to a plurality of non-standard sub-category representations, standard sub-category representations and mother category representations corresponding to the text to be labeled of each category.
The parent category representation corresponding to the text to be labeled of each category is the category representation of the category, each category is a parent category, so that the parent category is called parent category representation, a plurality of sub-categories are distinguished under each category, and the text representation of each sub-category is shown as a sub-category representation. In the embodiment of the present disclosure, the arithmetic mean of all similar questions in each nonstandard sub-category is calculated, and the arithmetic mean is used as the nonstandard sub-category corresponding to each nonstandard sub-category to represent, and may be obtained by first obtaining vectors corresponding to all similar questions in each nonstandard sub-category according to the related knowledge represented by the text, then calculating the arithmetic mean of the vectors corresponding to all similar questions in each nonstandard sub-category, and using the arithmetic mean as the nonstandard sub-category corresponding to each nonstandard sub-category to represent. Or calculating the weighted sum of all similar questions in each nonstandard sub-category, averaging the weighted sum, and representing the averaged value as the corresponding nonstandard sub-category of each nonstandard sub-category. The computation of the standard subcategory representation and the parent category representation is similar to the computation of the non-standard subcategory representation.
Optionally, the updating module 305 is further configured to input the second text to be labeled into the text encoder, so as to obtain a first text representation corresponding to the second text to be labeled; calculating a third similarity between the first text representation and each parent category representation in the category set, and when the third similarity is greater than a preset threshold: calculating fourth similarity of the first text representation and standard subcategory representations corresponding to each parent category representation in the category set, adding the second text to be labeled to the standard subcategory when the fourth similarity is larger than a preset threshold value, and updating the standard subcategory representations; when the fourth similarity is smaller than a preset threshold value, calculating fifth similarity of the first text representation and each nonstandard sub-category representation corresponding to each parent category representation in the category set, when the fifth similarity is larger than the preset threshold value, adding the second text to be labeled to the nonstandard sub-category corresponding to the fifth similarity larger than the preset threshold value, and updating the nonstandard sub-category representation corresponding to the fifth similarity larger than the preset threshold value; and when the third similarity is smaller than a preset threshold value, adding the first text representation as a new mother class representation into the class set.
The text encoder, which may be a BERT model, has been trained, learns and stores correspondences between the text to be annotated and the text representation.
The third similarity is greater than the preset threshold, which indicates that the second text to be labeled must belong to the parent category corresponding to the third similarity greater than the preset threshold, and therefore the second text to be labeled either belongs to the standard subcategory of the parent category or belongs to a non-standard subcategory of the parent category. Therefore, when the fourth similarity is smaller than the preset threshold, then the fifth similarity must be larger than the preset threshold. The text to be annotated, the parent category and the subcategory can be understood as text information, and the text representation, the subcategory representation and the parent category representation can be understood as vector information. And the third similarity is smaller than a preset threshold value, which indicates that the second text to be labeled does not belong to any parent category in the category set, so that the second text to be labeled is added to the category set as a new parent category. The set of categories comprises, or may be understood to comprise, respective parent categories, respective sub-categories, respective parent category representations and respective sub-category representations, each parent category representation being inclusive of the corresponding parent category and each sub-category representation being inclusive of the corresponding sub-category. And adding the second text to be annotated to the standard subcategory and updating the standard subcategory representation, wherein updating the standard subcategory representation means adding the first text representation to the standard subcategory representation. Further, the second text to be labeled can be added to the category set as a new sub-category of a new parent category, and when other texts to be labeled are detected subsequently, the new parent category is perfected or updated. If the similarity corresponding to the third text to be labeled and the second text to be labeled is greater than the preset threshold but smaller than the second preset threshold, the third text to be labeled can be used as the new parent category, except the second text to be labeled, and another new sub-category.
Optionally, the recognition module 306 is further configured to, when receiving a text recognition instruction, obtain the text to be recognized and the category set; inputting the text to be recognized into a text encoder to obtain a second text representation corresponding to the text to be recognized; calculating a sixth similarity of the second textual representation to any one of the following category representations: each parent category representation in the category set, the standard sub-category representation corresponding to each parent category representation, and any one non-standard sub-category representation corresponding to each parent category representation; and when the sixth similarity is greater than the preset threshold, identifying the text to be recognized as the category corresponding to the sixth similarity greater than the preset threshold.
The text recognition method provided by the embodiment of the disclosure can recognize the parent category of the text to be recognized, and also can recognize the sub-category of the text to be recognized, wherein the sub-category includes: a non-standard sub-category and a standard sub-category. The preset threshold value appearing for multiple times in the present disclosure may be the same threshold value, for example, all of the threshold values are 0.7, or the preset threshold values appearing in different embodiments of the present disclosure may be different. And when the sixth similarity is greater than the preset threshold, identifying the text to be recognized as the category corresponding to the sixth similarity greater than the preset threshold. The category may be any parent category in the category set or any sub-category under any parent category.
Optionally, the recognition module 306 is further configured to input the text to be recognized into a text encoder, so as to obtain a second text representation corresponding to the text to be recognized; and determining the category corresponding to the text to be recognized from the category set according to the second text representation by using a nearest neighbor algorithm.
For example, using a nearest neighbor algorithm, 50 sub-categories which are nearest to the second text representation are selected from each parent category in the category set, then the arithmetic mean of the 50 selected sub-categories of each parent category is respectively calculated, and the parent category which is nearest to the second text representation and has the arithmetic mean of each parent category is taken as the category corresponding to the text to be recognized.
Optionally, the updating module 305 is further configured to perform early warning on similar questions in the subclasses, so as to assist in improving the quality of data already labeled in the class set. Specifically, the similarity between the standard subclass and the non-standard subclass and other similarity is smaller than a certain threshold, and the labeling processing or text representation is performed again.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.
Fig. 4 is a schematic diagram of an electronic device 4 provided by the embodiment of the present disclosure. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps in the various method embodiments described above are implemented when the processor 401 executes the computer program 403. Alternatively, the processor 401 implements the functions of the respective modules/units in the above-described respective apparatus embodiments when executing the computer program 403.
Illustratively, the computer program 403 may be partitioned into one or more modules/units, which are stored in the memory 402 and executed by the processor 401 to accomplish the present disclosure. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 403 in the electronic device 4.
The electronic device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other electronic devices. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. Those skilled in the art will appreciate that fig. 4 is merely an example of the electronic device 4, and does not constitute a limitation of the electronic device 4, and may include more or less components than those shown, or combine certain components, or different components, e.g., the electronic device may also include input-output devices, network access devices, buses, etc.
The Processor 401 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 4. Further, the memory 402 may also include both internal storage units of the electronic device 4 and external storage devices. The memory 402 is used for storing computer programs and other programs and data required by the electronic device. The memory 402 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other ways. For example, the above-described apparatus/electronic device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, and multiple units or components may be combined or integrated into another system, or some features may be omitted or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method in the above embodiments, and may also be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the above methods and embodiments. The computer program may comprise computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain suitable additions or additions that may be required in accordance with legislative and patent practices within the jurisdiction, for example, in some jurisdictions, computer readable media may not include electrical carrier signals or telecommunications signals in accordance with legislative and patent practices.
The above examples are only intended to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present disclosure, and are intended to be included within the scope of the present disclosure.

Claims (10)

1. A method for recognizing text categories is characterized by comprising the following steps:
acquiring a first text to be annotated, wherein the first text to be annotated comprises a plurality of categories of texts to be annotated, and the text to be annotated of each category comprises a standard question and a plurality of similar questions;
in the text to be labeled of each category, determining a standard subcategory corresponding to the text to be labeled of each category according to the standard question and the first similarity of each similar question;
in the text to be labeled of each category, determining a nonstandard sub-category corresponding to the text to be labeled of each category according to a second similarity between any one of the similar questions and each of the other similar questions;
determining a category set according to the standard subcategory and the non-standard subcategories corresponding to the text to be labeled of each category;
when a second text to be labeled is detected, updating the category set according to the second text to be labeled;
and when the text to be recognized is detected, determining the category corresponding to the text to be recognized from the category set by using a nearest neighbor algorithm.
2. The method according to claim 1, wherein the determining, in the text to be labeled in each category, a standard sub-category corresponding to the text to be labeled in each category according to the standard question and a first similarity of each similar question comprises:
in the text to be labeled of each category:
calculating a first similarity between the standard question and each similarity question;
and when the first similarity is greater than a preset threshold value, adding the similar questions corresponding to the first similarity greater than the preset threshold value into a standard subcategory, and deleting the similar questions which are already added into the standard subcategory in the text to be annotated.
3. The method according to claim 1, wherein the determining, in the text to be labeled in each category, a non-standard sub-category corresponding to the text to be labeled in each category according to a second similarity between any one of the similar questions and each of the other similar questions comprises:
in the text to be labeled of each category:
calculating a second similarity between any one of the similarity questions and each of the other similarity questions;
and when the second similarity is larger than a preset threshold value, adding the similar questions corresponding to the second similarity larger than the preset threshold value into the non-standard subcategory corresponding to any one of the similar questions, and deleting the similar questions which are already added into the non-standard subcategory in the text to be labeled.
4. The method according to claim 1, wherein the determining a category set according to the standard sub-category and the non-standard sub-categories corresponding to the text to be labeled of each category comprises:
in the standard sub-category and the plurality of non-standard sub-categories corresponding to the text to be labeled of each category: calculating the arithmetic mean value of all the similar questions in each nonstandard sub-category to obtain the non-standard sub-category representation corresponding to each non-standard sub-category, calculating the arithmetic mean value of the standard questions and all the similar questions in the standard sub-category to obtain the standard sub-category representation corresponding to the standard sub-category, calculating the arithmetic mean value of all the non-standard sub-category representations and the standard sub-category representation to obtain the mother category representation corresponding to the text to be labeled of each category;
and determining the category set according to the plurality of non-standard sub-category representations, the standard sub-category representations and the mother category representation corresponding to the text to be labeled of each category.
5. The method according to claim 1, wherein the updating the category set according to the second text to be labeled when the second text to be labeled is detected comprises:
inputting the second text to be labeled into a text encoder to obtain a first text representation corresponding to the second text to be labeled;
calculating a third similarity of the first text representation to each parent class representation in the set of classes, when the third similarity is greater than a preset threshold:
calculating a fourth similarity of the standard sub-category representation corresponding to the first text representation and each parent category representation in the category set, and when the fourth similarity is greater than a preset threshold, adding the second text to be annotated to the standard sub-category and updating the standard sub-category representation;
when the fourth similarity is smaller than a preset threshold, calculating a fifth similarity of each non-standard sub-category representation corresponding to each parent category representation in the category set and the first text representation, when the fifth similarity is larger than a preset threshold, adding the second text to be annotated to the non-standard sub-category corresponding to the fifth similarity larger than the preset threshold, and updating the non-standard sub-category representation corresponding to the fifth similarity larger than the preset threshold;
and when the third similarity is smaller than a preset threshold value, adding the first text representation as a new parent class representation to the class set.
6. The method according to claim 1, wherein when detecting a second text to be labeled, after updating the category set according to the second text to be labeled, the method further comprises:
when a text recognition instruction is received, acquiring the text to be recognized and the category set;
inputting the text to be recognized into a text encoder to obtain a second text representation corresponding to the text to be recognized;
calculating a sixth similarity of the second textual representation to any one of the following category representations:
each parent category representation in the category set, the standard sub-category representation corresponding to each parent category representation, and any one of the non-standard sub-category representations corresponding to each parent category representation;
and when the sixth similarity is greater than a preset threshold value, identifying the text to be recognized as a category corresponding to the sixth similarity greater than the preset threshold value.
7. The method according to claim 1, wherein when the text to be recognized is detected, determining the category corresponding to the text to be recognized from the category set by using a nearest neighbor algorithm comprises:
inputting the text to be recognized into a text encoder to obtain a second text representation corresponding to the text to be recognized;
and determining the category corresponding to the text to be recognized from the category set according to the second text representation by using a nearest neighbor algorithm.
8. An apparatus for recognizing a text category, comprising:
the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is configured to acquire a first text to be annotated, the first text to be annotated comprises a plurality of categories of texts to be annotated, and each category of texts to be annotated comprises a standard question and a plurality of similar questions;
the first determining module is configured to determine a standard subcategory corresponding to the text to be labeled of each category according to the standard questions and the first similarity of each similar question in the text to be labeled of each category;
the second determining module is configured to determine a nonstandard sub-category corresponding to the text to be labeled of each category according to a second similarity between any one of the similar questions and each of the other similar questions in the text to be labeled of each category;
the third determination module is configured to determine a category set according to the standard sub-category and the plurality of non-standard sub-categories corresponding to the text to be labeled of each category;
the updating module is configured to update the category set according to a second text to be labeled when the second text to be labeled is detected;
and the recognition module is configured to determine a category corresponding to the text to be recognized from the category set by using a nearest neighbor algorithm when the text to be recognized is detected.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202111440947.0A 2021-11-30 2021-11-30 Text category identification method and device Active CN114138972B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111440947.0A CN114138972B (en) 2021-11-30 2021-11-30 Text category identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111440947.0A CN114138972B (en) 2021-11-30 2021-11-30 Text category identification method and device

Publications (2)

Publication Number Publication Date
CN114138972A true CN114138972A (en) 2022-03-04
CN114138972B CN114138972B (en) 2024-07-16

Family

ID=80389709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111440947.0A Active CN114138972B (en) 2021-11-30 2021-11-30 Text category identification method and device

Country Status (1)

Country Link
CN (1) CN114138972B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020114100A1 (en) * 2018-12-06 2020-06-11 中兴通讯股份有限公司 Information processing method and apparatus, and computer storage medium
CN112182217A (en) * 2020-09-28 2021-01-05 云知声智能科技股份有限公司 Method, device, equipment and storage medium for identifying multi-label text categories
CN112541079A (en) * 2020-12-10 2021-03-23 杭州远传新业科技有限公司 Multi-intention recognition method, device, equipment and medium
CN113205814A (en) * 2021-04-28 2021-08-03 平安科技(深圳)有限公司 Voice data labeling method and device, electronic equipment and storage medium
CN113435499A (en) * 2021-06-25 2021-09-24 平安科技(深圳)有限公司 Label classification method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020114100A1 (en) * 2018-12-06 2020-06-11 中兴通讯股份有限公司 Information processing method and apparatus, and computer storage medium
CN112182217A (en) * 2020-09-28 2021-01-05 云知声智能科技股份有限公司 Method, device, equipment and storage medium for identifying multi-label text categories
CN112541079A (en) * 2020-12-10 2021-03-23 杭州远传新业科技有限公司 Multi-intention recognition method, device, equipment and medium
CN113205814A (en) * 2021-04-28 2021-08-03 平安科技(深圳)有限公司 Voice data labeling method and device, electronic equipment and storage medium
CN113435499A (en) * 2021-06-25 2021-09-24 平安科技(深圳)有限公司 Label classification method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
武文杰;王红蕾;: "对遮挡人脸修复识别的改进与应用", 软件, no. 05, 15 May 2020 (2020-05-15) *
段军红;李晓宇;慕德俊: "一种非完全标注的文本分类训练方法", 微处理机, no. 001, 31 December 2019 (2019-12-31) *

Also Published As

Publication number Publication date
CN114138972B (en) 2024-07-16

Similar Documents

Publication Publication Date Title
CN110221145B (en) Power equipment fault diagnosis method and device and terminal equipment
CN113378835B (en) Labeling model training, sample labeling method and related device
CN109214501B (en) Method and apparatus for identifying information
CN112181835A (en) Automatic testing method and device, computer equipment and storage medium
CN113643260A (en) Method, apparatus, device, medium and product for detecting image quality
CN116403250A (en) Face recognition method and device with shielding
CN107071553B (en) Method, device and computer readable storage medium for modifying video and voice
CN113936232A (en) Screen fragmentation identification method, device, equipment and storage medium
CN116955198B (en) Rule set determining method and device
CN113779346A (en) Method and device for identifying one person with multiple accounts
CN113591709A (en) Motion recognition method, motion recognition device, motion recognition apparatus, motion recognition medium, and computer program product
CN115984783B (en) Crowd counting method and device
CN111046393B (en) Vulnerability information uploading method and device, terminal equipment and storage medium
CN115412346B (en) Message detection method and device, electronic equipment and storage medium
CN114138972B (en) Text category identification method and device
CN115953803A (en) Training method and device for human body recognition model
CN113051400B (en) Labeling data determining method and device, readable medium and electronic equipment
CN114764713A (en) Method and device for generating merchant patrol task, electronic equipment and storage medium
CN111079185B (en) Database information processing method and device, storage medium and electronic equipment
CN114418142A (en) Equipment inspection method and device
CN110334763B (en) Model data file generation method, model data file generation device, model data file identification device, model data file generation apparatus, model data file identification apparatus, and model data file identification medium
CN110532186B (en) Method, device, electronic equipment and storage medium for testing by using verification code
CN113808134A (en) Oil tank layout information generation method, oil tank layout information generation device, electronic apparatus, and medium
CN110083807B (en) Contract modification influence automatic prediction method, device, medium and electronic equipment
CN114281964A (en) Method and device for determining conversation skill service, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20221220

Address after: 518054 cable information transmission building 25f2504, no.3369 Binhai Avenue, Haizhu community, Yuehai street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen Xumi yuntu Space Technology Co.,Ltd.

Address before: No.103, no.1003, Nanxin Road, Nanshan community, Nanshan street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: Shenzhen Jizhi Digital Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant