CN111708900A - Expansion method and expansion device for tag synonym, electronic device and storage medium - Google Patents

Expansion method and expansion device for tag synonym, electronic device and storage medium Download PDF

Info

Publication number
CN111708900A
CN111708900A CN202010553900.4A CN202010553900A CN111708900A CN 111708900 A CN111708900 A CN 111708900A CN 202010553900 A CN202010553900 A CN 202010553900A CN 111708900 A CN111708900 A CN 111708900A
Authority
CN
China
Prior art keywords
character string
text
length
label
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010553900.4A
Other languages
Chinese (zh)
Other versions
CN111708900B (en
Inventor
石慧江
于政
王道广
袁灿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202010553900.4A priority Critical patent/CN111708900B/en
Publication of CN111708900A publication Critical patent/CN111708900A/en
Application granted granted Critical
Publication of CN111708900B publication Critical patent/CN111708900B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application provides an expansion method, an expansion device, electronic equipment and a storage medium of a tag synonym, wherein the expansion method comprises the following steps: firstly, acquiring a section of target text and a preset label aiming at the target text; then, determining a plurality of character string lengths and a plurality of text labels under each character string length from the target text, and determining the similarity between each text label and the preset label; and finally, determining the text label with the maximum similarity from the plurality of determined text labels as a preset label synonym. According to the scheme, under the condition that a section of target text and a preset label are known, by the expansion method of the label synonym, one section of the target text is captured as the synonym of the preset label, the label synonym of the existing label can be effectively and quickly extracted from the target text, the working efficiency is improved, and the time cost is reduced.

Description

Expansion method and expansion device for tag synonym, electronic device and storage medium
Technical Field
The present application relates to the field of tagging technologies, and in particular, to an expansion method and an expansion apparatus for tag synonyms, an electronic device, and a storage medium.
Background
When the label is printed, the person who prints the label does not know all the labels completely, but only knows the relatively popular description information, and further needs to search out the corresponding label through the similar description information, and the description information is the synonym of the label.
Take the car maintenance case as an example: generally, two kinds of labels are marked during automobile maintenance, wherein one kind is a fault phenomenon, and the other kind is a maintenance result; wherein the failure phenomenon is a label printed when the case is created, and the maintenance result is a label printed by a maintenance engineer when the case is finalized; the tag is relatively fixed, such as the fault tag "accelerating without force", which may be synonymously described as "slow start", "no fuel door go", etc.; the repair result label "change engine", which may be synonymous description "engine change", "change engine", etc.; further, more synonyms corresponding to the labeled tags are mined, so that the labeled people can quickly search out the corresponding tags.
In the prior art, a method of manually exhausting tag synonyms is usually adopted, namely, an expert in a professional field explains and summarizes a certain tag, although a synonym library obtained by the expert in the arrangement generally has higher data quality, obviously, the method has extremely high cost, firstly, in order to ensure the quality of the thesaurus, the domain expert needs to have a longer working life in the field, and the domain expert has deep knowledge; secondly, in order to ensure the word bank width, multiple experts are often needed to arrange the word banks together.
Disclosure of Invention
In view of this, an object of the present application is to provide an expansion method, an expansion device, an electronic device, and a storage medium for tag synonyms, which can effectively and quickly extract tag synonyms of existing tags from a text, thereby improving work efficiency and reducing time cost.
In a first aspect, the present application provides an expansion method of tag synonyms, where the expansion method includes:
acquiring a section of target text and a preset label aiming at the target text;
determining various character string lengths and a plurality of text labels under each character string length from the target text, and determining the similarity between each text label and the preset label;
and determining the text label with the maximum similarity from the plurality of determined text labels as a preset label synonym.
Preferably, the plurality of string lengths are determined by:
acquiring a total length value of the character string of the target text;
determining a character string length interval array of the target text, wherein one character string length value is used as a starting point of the character string length interval array, and the total character string length value is used as an end point of the character string length interval array;
and taking the single character length value as a division step length, dividing the character string length in the character string length interval array to obtain values, and determining various character string lengths.
Preferably, the plurality of string lengths are determined by:
acquiring a length value and a preset variable value of the preset label;
on the basis of the length value of the preset label, increasing and decreasing the preset variable value to obtain the maximum value and the minimum value of the character string length value interval array;
and taking a single character length value as a division step length, dividing and taking the character string length in the character string length value interval array, and determining various character string lengths.
Preferably, the plurality of text labels at each string length is determined by:
determining at least one starting point of the target text;
and taking the length of each character string as a sliding step length, and respectively carrying out sliding value taking on the target text from each starting point to obtain a plurality of text labels under the length of each character string.
Preferably, the similarity between each text label and the preset label is determined by:
calculating semantic similarity between each text label and the preset label;
and determining the semantic similarity with the maximum similarity from all the semantic similarities.
In a second aspect, the present application provides an expansion device for tag synonyms, the expansion device comprising:
the acquisition module is used for acquiring a section of target text and a preset label aiming at the target text;
the determining module is used for determining various character string lengths from the target text, determining a plurality of text labels under each character string length, and determining the similarity between each text label and the preset label;
and the synonym determining module is used for determining the text label with the maximum similarity from the plurality of determined text labels as the preset label synonym.
Preferably, the determining module is configured to determine the plurality of character string lengths by:
acquiring a total length value of the character string of the target text;
determining a character string length interval array of the target text, wherein one character string length value is used as a starting point of the character string length interval array, and the total character string length value is used as an end point of the character string length interval array;
and taking the single character length value as a division step length, dividing the character string length in the character string length interval array to obtain values, and determining various character string lengths.
Preferably, the determining module is configured to determine the plurality of character string lengths by:
acquiring a length value and a preset variable value of the preset label;
on the basis of the length value of the preset label, increasing and decreasing the preset variable value to obtain the maximum value and the minimum value of the character string length value interval array;
and taking a single character length value as a division step length, dividing and taking the character string length in the character string length value interval array, and determining various character string lengths.
Preferably, the determining module is configured to determine the plurality of text labels at each string length by:
determining at least one starting point of the target text;
and taking the length of each character string as a sliding step length, and respectively carrying out sliding value taking on the target text from each starting point to obtain a plurality of text labels under the length of each character string.
Preferably, the determining module is configured to determine the similarity between each text label and the preset label by:
calculating semantic similarity between each text label and the preset label;
and determining the semantic similarity with the maximum similarity from all the semantic similarities.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the method of augmenting a tag synonym as described above.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the tag synonym expansion method described above.
The embodiment of the application provides an expansion method, an expansion device, electronic equipment and a storage medium for tag synonyms, wherein the expansion method comprises the following steps: firstly, acquiring a section of target text and a preset label aiming at the target text; then, determining a plurality of character string lengths and a plurality of text labels under each character string length from the target text, and determining the similarity between each text label and the preset label; and finally, determining the text label with the maximum similarity from the plurality of determined text labels as a preset label synonym. According to the scheme, under the condition that a section of target text and a preset label are known, by the expansion method of the label synonym, one section of the target text is captured as the synonym of the preset label, the label synonym of the existing label can be effectively and quickly extracted from the target text, the working efficiency is improved, and the time cost is reduced.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a flowchart of a method for expanding a tag synonym according to an embodiment of the present disclosure;
FIG. 2 is a flowchart of a first method for determining lengths of a plurality of character strings according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of a second method for determining lengths of a plurality of character strings according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of an expansion device for tag synonyms according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Reference numerals: 400-an expansion device; 410-an obtaining module; 420-a determination module; 430-synonym determination module; 500-an electronic device; 510-a processor; 520-a memory; 530-bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. Every other embodiment that can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present application falls within the protection scope of the present application.
To enable those skilled in the art to use the present disclosure, the following embodiments are presented in conjunction with a specific application scenario "labeled cases at auto repair". It will be apparent to those skilled in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the application. Although the present application is described primarily in the context of a labeling case in automotive service, it should be understood that this is only one exemplary embodiment.
In the prior art, a method of manually exhausting tag synonyms is usually adopted, namely, an expert in a professional field explains and summarizes a certain tag, although a synonym library obtained by the expert in the arrangement generally has higher data quality, obviously, the method has extremely high cost, firstly, in order to ensure the quality of the thesaurus, the domain expert needs to have a longer working life in the field, and the domain expert has deep knowledge; secondly, in order to ensure the word bank width, multiple experts are often needed to arrange the word banks together. Based on this, the embodiment of the application provides an expansion method, an expansion device, an electronic device and a storage medium for tag synonyms, under the condition that a section of target text and a preset tag are known, by the expansion method for tag synonyms provided by the application, one section of text is captured from the target text as the synonyms of the preset tag, a method for manually exhausting the tag synonyms in the prior art is replaced, the tag synonyms of the existing tag can be effectively and quickly extracted from the text, the working efficiency is improved, and the time cost is reduced.
Referring to fig. 1, fig. 1 is a flowchart of an expansion method for tag synonyms according to an embodiment of the present disclosure, and as shown in fig. 1, an embodiment of the present disclosure provides an expansion method for tag synonyms, where the expansion method includes:
s110, acquiring a section of target text and a preset label aiming at the target text.
In the embodiment of the present application, the target text and the preset tag for the target text are known, and a service text of an automobile service case is taken as an example, for example: the label is label information marked on the section of the maintenance text, for example, the preset labels of the section of the maintenance text are 'abnormal engine sound' and 'engine replacement'.
S120, determining various character string lengths and a plurality of text labels under each character string length from the target text, and determining the similarity between each text label and the preset label.
The method comprises two implementation modes, wherein the first implementation mode is that the similarity between a text label and a preset label is calculated once every time one text label is determined from a target text; the second implementation method is to divide the target text into a plurality of texts with different character string lengths, determine all the texts as text labels, and then calculate the similarity between each text label and the preset label, and no matter which method is adopted, the similarity between each text label and the preset label can be obtained.
S130, determining the text label with the maximum similarity from the plurality of determined text labels as a preset label synonym.
In the embodiment of the application, the text label most similar to the preset label is found from all the text labels, because the greater the similarity is, the closer the text label is to the preset label, and the text label with the maximum similarity is determined to be the synonym of the preset label.
Specifically, if the fault condition label is "acceleration weakness," the label synonyms may be described as "slow start," "no fuel door go," and the like; the repair results label "change engine", which synonyms may be described as "engine change", "change engine", and the like.
The embodiment of the application provides an expansion method of tag synonyms, which comprises the following steps: firstly, acquiring a section of target text and a preset label aiming at the target text; then, determining a plurality of character string lengths and a plurality of text labels under each character string length from the target text, and determining the similarity between each text label and the preset label; and finally, determining the text label with the maximum similarity from the plurality of determined text labels as a preset label synonym. According to the scheme, under the condition that a section of target text and a preset label are known, by the expansion method of the label synonym, one section of the target text is captured as the synonym of the preset label, the label synonym of the existing label can be effectively and quickly extracted from the target text, the working efficiency is improved, and the time cost is reduced.
Referring to fig. 2, fig. 2 is a flowchart of a first method for determining lengths of a plurality of character strings according to an embodiment of the present disclosure; as shown in fig. 2, a variety of string lengths are determined by:
s210, acquiring a total length value of the character string of the target text.
In the embodiment of the application, a section of target text is selected, and the number of all characters in the target text is counted, so that the total length value of the character string is obtained.
S220, determining a character string length interval array of the target text, wherein one character string length value is used as a starting point of the character string length interval array, and the total character string length value is used as an end point of the character string length interval array.
In the embodiment of the present application, a string length interval array of a target text is a one-dimensional array, and a string length value is used as a starting point of the string length interval arrayThe total string length value is used as the end point of the string length interval array, for example: the one-dimensional array being [ a ]1,a2,a3,a4……an]Wherein a is1Indicating a character length value, a2Indicating two character length values, a3Representing three character length values, and so on, anRepresenting the total length value of the string.
And S230, dividing the length of the character string in the character string length interval array to obtain values by taking the length value of a single character as a dividing step length, and determining the lengths of various character strings.
In the embodiment of the present application, the division step is the length of each character, and a one-dimensional array formed by the lengths of the character strings is divided, where the division step is a single character length value, for example: [ a1 | a2 | a3 | a4 | … … | an ], thereby determining a plurality of string lengths as a one-character length value, a two-character length value, a three-character length value, and so on, the longest string length being the total string length value.
The method for determining the lengths of the various character strings provided by the embodiment of the application is to count all character length values possibly appearing in the length of the character strings.
Referring to fig. 3, fig. 3 is a flowchart illustrating a second method for determining lengths of a plurality of character strings according to an embodiment of the present disclosure; as shown in fig. 3, a variety of string lengths are determined by:
s310, obtaining the length value and the preset variable value of the preset label.
In the embodiment of the application, the length value of the preset label is calculated, the preset variable value is preset according to experience, and on the premise that the length value of the preset label is known, the preset variable value is converted to obtain the length value of the character string which is close to the length value of the preset label.
And S320, increasing and decreasing the preset variable value on the basis of the length value of the preset label to obtain the maximum value and the minimum value of the character string length value interval array.
In the embodiment of the application, the length value of the preset label is based onOn, increase and reduce preset the variable value obtains the interval array of character string length value that is close to the length value of presetting the label, increases preset the variable value on the length value's of presetting the label basis, as the maximum value of the interval array of character string length value, reduces preset the variable value on the length value's of presetting the label basis, as the minimum value of the interval array of character string length value, for example: [ b ] a3,b4……bn]Wherein b is3Minimum value representing an array of intervals of length values of character strings, bnRepresenting the maximum value of the array of string length value intervals.
S330, dividing the length of the character string in the character string length value interval array by taking the length value of a single character as a dividing step length, and determining the lengths of various character strings.
S330 provided in the embodiment of the present application is the same as S230 mentioned in the above embodiments, and the technical effects that can be achieved are also the same, which are not described herein again.
In the embodiment of the present application, as a preferred embodiment, the plurality of text labels in each character string length is determined by the following steps:
at least one starting point of the target text is determined.
Specifically, the number of starting points of the target text is the same as the total number of characters of the target text, and the starting point is a position before any character.
And taking the length of each character string as a sliding step length, and respectively carrying out sliding value taking on the target text from each starting point to obtain a plurality of text labels under the length of each character string.
In the embodiment of the application, various character string lengths are obtained through dividing and taking values, and the target text is subjected to sliding value taking according to the obtained various character string lengths; the specific implementation mode is as follows: starting from a starting point of the target text, performing sliding value taking on the target text according to one of the character string lengths until all characters of the target text are subjected to sliding value taking, wherein each starting point corresponds to all the character string lengths, and sliding value taking needs to be performed on the target text again every time the starting point is changed.
In the embodiment of the present application, as an optional embodiment, the similarity between each text label and the preset label is determined through the following steps:
calculating semantic similarity between each text label and the preset label;
and determining the semantic similarity with the maximum similarity from all the semantic similarities.
In the embodiment of the application, semantic similarity between all text labels and preset labels is calculated, the semantic similarity with the maximum similarity is selected from the semantic similarity, and the text label with the maximum semantic similarity is used as a label synonym.
Specifically, a section of target text is obtained firstly, and then a preset label corresponding to the section of target text is determined, wherein the preset label is label information printed on the target text; intercepting the character of the division step length n once from the starting point position i of the target text being more than or equal to 0, and carrying out similarity calculation with a preset label to obtain a similarity result value p; storing a similarity result value p, putting the similarity result value p into a probability set with a division step length of n, moving a starting point position i more than or equal to 0 backward by one bit to enable the position i to be i +1, and recalculating the similarity once every time the starting point position i moves backward by one bit to obtain a similarity result value until the tail of a character string of a target text is reached; when the tail of the character string is reached, the dividing step length n is n +1, the tail is reached, the dividing step length n is shown to be ended, the similarity result of all text labels of the dividing step length n and preset labels is recorded, the dividing step length n is increased progressively, the process is executed circularly starting from the position i is 0. According to the embodiment of the application, under the condition that the label is known, the label synonym is excavated from the text, the mode that the label synonym is obtained through manual exhaustion is avoided, and the working efficiency is improved.
The embodiment of the application provides an expansion method of tag synonyms, which comprises the following steps: firstly, acquiring a section of target text and a preset label aiming at the target text; then, determining a plurality of character string lengths and a plurality of text labels under each character string length from the target text, and determining the similarity between each text label and the preset label; and finally, determining the text label with the maximum similarity from the plurality of determined text labels as a preset label synonym. According to the scheme, under the condition that a section of target text and a preset label are known, by the expansion method of the label synonym, one section of the target text is captured as the synonym of the preset label, the label synonym of the existing label can be effectively and quickly extracted from the target text, the working efficiency is improved, and the time cost is reduced.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an expansion device for tag synonyms according to an embodiment of the present application, as shown in fig. 4, the expansion device 400 includes:
an obtaining module 410, configured to obtain a target text and a preset tag for the target text;
a determining module 420, configured to determine, from the target text, a plurality of character string lengths and a plurality of text labels in each character string length, and determine a similarity between each text label and the preset label;
and a synonym determining module 430, configured to determine, from the determined multiple text labels, that the text label with the largest similarity is a preset label synonym.
In this embodiment of the application, the determining module 420 is configured to determine the lengths of the plurality of character strings by:
acquiring a total length value of the character string of the target text;
determining a character string length interval array of the target text, wherein one character string length value is used as a starting point of the character string length interval array, and the total character string length value is used as an end point of the character string length interval array;
and taking the single character length value as a division step length, dividing the character string length in the character string length interval array to obtain values, and determining various character string lengths.
In this embodiment, as a preferred embodiment, the determining module 420 is configured to determine a plurality of character string lengths by:
acquiring a length value and a preset variable value of the preset label;
on the basis of the length value of the preset label, increasing and decreasing the preset variable value to obtain the maximum value and the minimum value of the character string length value interval array;
and taking a single character length value as a division step length, dividing and taking the character string length in the character string length value interval array, and determining various character string lengths.
In this embodiment of the present application, the determining module 420 is configured to determine a plurality of text labels at each string length by:
determining at least one starting point of the target text;
and taking the length of each character string as a sliding step length, and respectively carrying out sliding value taking on the target text from each starting point to obtain a plurality of text labels under the length of each character string.
In this embodiment of the application, the determining module 420 is configured to determine the similarity between each text label and the preset label by:
calculating semantic similarity between each text label and the preset label;
and determining the semantic similarity with the maximum similarity from all the semantic similarities.
The embodiment of the present application provides an expansion device for tag synonyms, where the expansion device includes: the system comprises an acquisition module, a determination module and a synonym determination module, wherein the acquisition module is used for acquiring a section of target text and a preset label aiming at the target text; then the determining module is used for determining a plurality of character string lengths from the target text, a plurality of text labels under each character string length and the similarity between each text label and the preset label; and the synonym determining module is used for determining the text label with the maximum similarity from the plurality of determined text labels as the preset label synonym. According to the scheme, under the condition that a section of target text and a preset label are known, by the expansion method of the label synonym, one section of the target text is captured as the synonym of the preset label, the label synonym of the existing label can be effectively and quickly extracted from the target text, the working efficiency is improved, and the time cost is reduced.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 5, the electronic device 500 includes a processor 510, a memory 520, and a bus 530.
The memory 520 stores machine-readable instructions executable by the processor 510, when the electronic device 500 runs, the processor 510 and the memory 520 communicate through the bus 530, and when the machine-readable instructions are executed by the processor 510, the steps of the method for extending a tag synonym in the embodiment of the method shown in fig. 1, the steps of the method for determining lengths of multiple character strings in the embodiment of the method shown in fig. 2, and the steps of the method for determining lengths of multiple character strings in the embodiment of the method shown in fig. 3 may be executed.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the step of the method for expanding a tag synonym in the embodiment of the method shown in fig. 1, the step of the first method for determining lengths of multiple character strings in the embodiment of the method shown in fig. 2, and the step of the second method for determining lengths of multiple character strings in the embodiment of the method shown in fig. 3 may be executed.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. An expansion method of tag synonyms, characterized in that the expansion method comprises:
acquiring a section of target text and a preset label aiming at the target text;
determining various character string lengths and a plurality of text labels under each character string length from the target text, and determining the similarity between each text label and the preset label;
and determining the text label with the maximum similarity from the plurality of determined text labels as a preset label synonym.
2. The augmentation method of claim 1, wherein the plurality of string lengths are determined by:
acquiring a total length value of the character string of the target text;
determining a character string length interval array of the target text, wherein one character string length value is used as a starting point of the character string length interval array, and the total character string length value is used as an end point of the character string length interval array;
and taking the single character length value as a division step length, dividing the character string length in the character string length interval array to obtain values, and determining various character string lengths.
3. The augmentation method of claim 1, wherein the plurality of string lengths are determined by:
acquiring a length value and a preset variable value of the preset label;
on the basis of the length value of the preset label, increasing and decreasing the preset variable value to obtain the maximum value and the minimum value of the character string length value interval array;
and taking a single character length value as a division step length, dividing and taking the character string length in the character string length value interval array, and determining various character string lengths.
4. The augmentation method of claim 1, wherein the plurality of text labels at each string length is determined by:
determining at least one starting point of the target text;
and taking the length of each character string as a sliding step length, and respectively carrying out sliding value taking on the target text from each starting point to obtain a plurality of text labels under the length of each character string.
5. The augmentation method of claim 1, wherein the similarity between each text tag and the preset tag is determined by:
calculating semantic similarity between each text label and the preset label;
and determining the semantic similarity with the maximum similarity from all the semantic similarities.
6. An expansion device for tag synonyms, the expansion device comprising:
the acquisition module is used for acquiring a section of target text and a preset label aiming at the target text;
the determining module is used for determining various character string lengths from the target text, determining a plurality of text labels under each character string length, and determining the similarity between each text label and the preset label;
and the synonym determining module is used for determining the text label with the maximum similarity from the plurality of determined text labels as the preset label synonym.
7. The expansion device of claim 6, wherein the determination module is configured to determine the plurality of string lengths by:
acquiring a total length value of the character string of the target text;
determining a character string length interval array of the target text, wherein one character string length value is used as a starting point of the character string length interval array, and the total character string length value is used as an end point of the character string length interval array;
and taking the single character length value as a division step length, dividing the character string length in the character string length interval array to obtain values, and determining various character string lengths.
8. The expansion device of claim 6, wherein the determination module is configured to determine the plurality of text labels at each string length by:
determining at least one starting point of the target text;
and taking the length of each character string as a sliding step length, and respectively carrying out sliding value taking on the target text from each starting point to obtain a plurality of text labels under the length of each character string.
9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when an electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the method of expanding tag synonyms of any one of claims 1-5.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the method for expansion of tag synonyms according to one of the claims 1 to 5.
CN202010553900.4A 2020-06-17 2020-06-17 Expansion method and expansion device for tag synonyms, electronic equipment and storage medium Active CN111708900B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010553900.4A CN111708900B (en) 2020-06-17 2020-06-17 Expansion method and expansion device for tag synonyms, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010553900.4A CN111708900B (en) 2020-06-17 2020-06-17 Expansion method and expansion device for tag synonyms, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111708900A true CN111708900A (en) 2020-09-25
CN111708900B CN111708900B (en) 2023-08-25

Family

ID=72540929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010553900.4A Active CN111708900B (en) 2020-06-17 2020-06-17 Expansion method and expansion device for tag synonyms, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111708900B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360346A (en) * 2021-06-22 2021-09-07 北京百度网讯科技有限公司 Method and apparatus for training a model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239300A (en) * 2013-06-06 2014-12-24 富士通株式会社 Method and device for excavating semantic keywords from text
CN106156204A (en) * 2015-04-23 2016-11-23 深圳市腾讯计算机系统有限公司 The extracting method of text label and device
WO2017080090A1 (en) * 2015-11-14 2017-05-18 孙燕群 Extraction and comparison method for text of webpage
CN108334533A (en) * 2017-10-20 2018-07-27 腾讯科技(深圳)有限公司 keyword extracting method and device, storage medium and electronic device
US20190108242A1 (en) * 2017-10-10 2019-04-11 Alibaba Group Holding Limited Search method and processing device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239300A (en) * 2013-06-06 2014-12-24 富士通株式会社 Method and device for excavating semantic keywords from text
CN106156204A (en) * 2015-04-23 2016-11-23 深圳市腾讯计算机系统有限公司 The extracting method of text label and device
WO2017080090A1 (en) * 2015-11-14 2017-05-18 孙燕群 Extraction and comparison method for text of webpage
US20190108242A1 (en) * 2017-10-10 2019-04-11 Alibaba Group Holding Limited Search method and processing device
CN108334533A (en) * 2017-10-20 2018-07-27 腾讯科技(深圳)有限公司 keyword extracting method and device, storage medium and electronic device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360346A (en) * 2021-06-22 2021-09-07 北京百度网讯科技有限公司 Method and apparatus for training a model
CN113360346B (en) * 2021-06-22 2023-07-11 北京百度网讯科技有限公司 Method and device for training model

Also Published As

Publication number Publication date
CN111708900B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
KR100578188B1 (en) Character recognition apparatus and method
CN104866478B (en) Malicious text detection and identification method and device
US20090319449A1 (en) Providing context for web articles
RU2009127102A (en) METHOD, DEVICE AND COMPUTER SOFTWARE PRODUCT FOR FLEXIBLE LANGUAGE IDENTIFICATION ON THE TEXT BASIS
US9280536B2 (en) Synonym determination among n-grams
US20110258202A1 (en) Concept extraction using title and emphasized text
JP4114600B2 (en) Variable length character string search device, variable length character string search method and program
CN111782907B (en) News classification method and device and electronic equipment
CN111708900A (en) Expansion method and expansion device for tag synonym, electronic device and storage medium
CN110222015B (en) File data reading and querying method and device and readable storage medium
JPS60245083A (en) Electronic dictionary
CN113282717B (en) Method and device for extracting entity relationship in text, electronic equipment and storage medium
CN114266251A (en) Malicious domain name detection method and device, electronic equipment and storage medium
CN105404903B (en) Information processing method and device and electronic equipment
CN113627132A (en) Data deduplication mark code generation method and system, electronic device and storage medium
CN110717323B (en) Document seal dividing method and device, terminal and computer readable storage medium
CN108776705B (en) Text full-text accurate query method, device, equipment and readable medium
US20160253374A1 (en) Data file writing method and system, and data file reading method and system
CN112567377A (en) Expression recognition using character skipping
Howard et al. Phonetic spelling algorithm implementations for R
CN112699634B (en) Typesetting processing method of electronic book, electronic equipment and storage medium
CN111428180B (en) Webpage duplicate removal method, device and equipment
JPH1139315A (en) Method for converting formatted document into sequenced word list
CN110737748B (en) Text deduplication method and system
CN111950289A (en) Data processing method and device based on automobile maintenance record

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant