CN111144709B - Method and device for determining novelty of machine-generated text - Google Patents

Method and device for determining novelty of machine-generated text Download PDF

Info

Publication number
CN111144709B
CN111144709B CN201911244272.5A CN201911244272A CN111144709B CN 111144709 B CN111144709 B CN 111144709B CN 201911244272 A CN201911244272 A CN 201911244272A CN 111144709 B CN111144709 B CN 111144709B
Authority
CN
China
Prior art keywords
machine
text
generated text
length
generated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911244272.5A
Other languages
Chinese (zh)
Other versions
CN111144709A (en
Inventor
张熙
靳凯夫
李小勇
方滨兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201911244272.5A priority Critical patent/CN111144709B/en
Publication of CN111144709A publication Critical patent/CN111144709A/en
Application granted granted Critical
Publication of CN111144709B publication Critical patent/CN111144709B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a method and a device for determining novelty of a machine-generated text, wherein the method comprises the following steps: acquiring a machine-generated text and a plurality of reference texts corresponding to the machine-generated text; determining an overlapping factor of the machine-generated text according to the words included in the machine-generated text and the words included in the plurality of reference texts; determining a repeated penalty factor of the machine-generated text according to the short sentence included in the machine-generated text; determining a length penalty factor of the machine-generated text according to the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts; and determining the novelty of the machine-generated text according to the overlapping factor, the repetition penalty factor and the length penalty factor of the machine-generated text. The overlapping degree of the machine-generated text and the reference text, the repetition degree of the machine-generated text and the length factors of the machine-generated text and the reference text are comprehensively considered, and the novelty of the machine-generated text is more effectively measured.

Description

Method and device for determining novelty of machine-generated text
Technical Field
The invention relates to the technical field of machine learning, in particular to a method and a device for determining novelty of a machine-generated text.
Background
With the development of artificial intelligence technology, the quality requirements of some natural language generation tasks on machine-generated texts are continuously improved. For example, in the fields of machine translation, human-computer conversation, and the like, higher quality requirements are placed on machine-generated texts.
The criteria for measuring the quality of machine-generated text mainly include the following three aspects: relevancy, language quality and novelty. The relevance represents the relevance degree of the machine-generated text and the reference text, such as the relevance degree of a machine translation result and an expert translation result in a machine translation task; the language quality represents the compliance degree of the machine-generated text in terms of sentence structure and grammar; novelty expresses how distinctive machine-generated text differs from reference text or other machine-generated text.
At present, a determination method for the relevance and language quality of a machine-generated text with better performance exists, but for the novelty of the machine-generated text, no determination method exists at present, and the novelty of the machine-generated text cannot be accurately determined, so that the quality of the machine-generated text cannot be accurately measured.
Disclosure of Invention
The embodiment of the invention aims to provide a method and a device for determining the novelty of a machine-generated text, so as to more accurately determine the novelty of the machine-generated text and further accurately measure the quality of the machine-generated text. The specific technical scheme is as follows:
in order to achieve the above object, an embodiment of the present invention provides a method for determining novelty of a machine-generated text, where the method includes:
acquiring a machine-generated text and a plurality of reference texts corresponding to the machine-generated text;
determining an overlapping factor of the machine-generated text according to words included in the machine-generated text and words included in the multiple reference texts, wherein the words are words obtained by segmenting the text according to a preset segmentation length;
determining a repeated punishment factor of the machine-generated text according to a short sentence included in the machine-generated text, wherein the short sentence is a sentence obtained by segmenting the text according to a preset separator;
determining a length penalty factor of the machine-generated text according to the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts;
and determining the novelty of the machine-generated text according to the overlapping factor, the repetition penalty factor and the length penalty factor of the machine-generated text.
Optionally, the step of determining an overlap factor of the machine-generated text according to the words included in the machine-generated text and the words included in the multiple reference texts includes:
for each preset segmentation length, determining an overlap factor corresponding to the preset segmentation length according to words corresponding to the preset segmentation length included in the machine-generated text and words corresponding to the preset segmentation length included in the multiple reference texts, wherein the words corresponding to the preset segmentation length are words obtained by segmenting the text according to the preset segmentation length;
and carrying out weighted summation on the overlapping factors corresponding to each preset segmentation length based on the preset weight of each preset segmentation length to obtain the overlapping factors of the machine-generated text.
Optionally, the machine-generated text is multiple;
the step of determining the overlap factor corresponding to each preset segmentation length according to the word corresponding to the preset segmentation length included in the machine-generated text and the word corresponding to the preset segmentation length included in the multiple reference texts includes:
counting a first number of words corresponding to each preset segmentation length included in each machine-generated text and a second number of words corresponding to each preset segmentation length included in the plurality of reference texts corresponding to the machine-generated text, aiming at each preset segmentation length;
and determining an overlapping factor corresponding to each preset segmentation length based on the preset parameters and the first number and the second number of the words corresponding to each preset segmentation length.
Optionally, the step of determining the overlap factor corresponding to each preset segmentation length based on the preset parameter, the first number and the second number of the words corresponding to each preset segmentation length includes:
aiming at each preset segmentation length, calculating an overlapping factor corresponding to the preset segmentation length according to the following formula:
Figure BDA0002307083570000031
Figure BDA0002307083570000032
wherein n represents a preset segmentation length, candidates represent the multiple machine-generated texts, references represent multiple reference texts of a machine-generated text C, r represents one reference text of the multiple reference texts, n-gram represents a word with a preset segmentation length of n, C represents the machine-generated text C, λ represents the preset parameter, and Count C (n-gram) represents the first number of words, count, corresponding to the preset segmentation length n of the machine-generated text c c-ref (n-gram) represents the number of words corresponding to the preset segmentation length n of the reference text corresponding to the machine-generated text c, delta represents the second number of words corresponding to the preset segmentation length n of the plurality of reference texts corresponding to the machine-generated text c, and P n And representing the overlapping factor corresponding to the preset segmentation length n.
Optionally, the step of performing weighted summation on the overlap factor corresponding to each preset segmentation length based on the preset weight of each preset segmentation length to obtain the overlap factor of the machine-generated text includes:
calculating an overlap factor for the machine-generated text according to the following formula:
Figure BDA0002307083570000033
wherein, P avg An overlap factor, P, representing the machine-generated text n Representing the overlap factor, w, corresponding to a predetermined slicing length, n n And the preset weight of the preset segmentation length N is represented, and the N represents the total number of the preset segmentation lengths.
Optionally, the step of determining a repetition penalty factor of the machine-generated text according to the clause included in the machine-generated text includes:
determining short sentences contained in the machine-generated text;
and calculating the similarity between short sentences contained in the machine-generated text, and determining a repetition penalty factor of the machine-generated text based on the similarity between the short sentences.
Optionally, the step of determining a length penalty factor of the machine-generated text according to the text length of the machine-generated text, the average text length of the multiple reference texts, and the minimum text length of the multiple reference texts includes:
acquiring the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts;
determining a length penalty factor for the machine-generated text according to the following formula:
Figure BDA0002307083570000041
where C represents the machine-generated text C, l C A text length representing the machine-generated text c,
Figure BDA0002307083570000042
an average text length of a plurality of reference texts representing the machine generated text c, <' >>
Figure BDA0002307083570000043
A minimum text length of a plurality of reference texts representing the machine-generated text C, and phi (C) represents a length penalty factor for the machine-generated text C.
Optionally, the step of determining the novelty of the machine-generated text according to the overlap factor, the repetition penalty factor and the length penalty factor of the machine-generated text includes:
and multiplying the overlapping factor, the repeated penalty factor and the length penalty factor in sequence to obtain the novelty of the machine-generated text.
Optionally, the method further includes:
when the determined novelty of the machine-generated text is greater than a preset novelty threshold, determining the machine-generated text as recommendable machine-generated text.
To achieve the above object, an embodiment of the present invention provides a novelty determining apparatus for machine-generated text, including:
the acquisition module is used for acquiring a machine generation text and a plurality of reference texts corresponding to the machine generation text;
the first determining module is used for determining an overlapping factor of the machine-generated text according to words included in the machine-generated text and words included in the reference texts, wherein the words are words obtained by segmenting the text according to a preset segmentation length;
the second determining module is used for determining a repeated punishment factor of the machine-generated text according to a short sentence included in the machine-generated text, wherein the short sentence is a sentence obtained by segmenting the text according to a preset separator;
a third determining module, configured to determine a length penalty factor of the machine-generated text according to a text length of the machine-generated text, an average text length of the multiple reference texts, and a minimum text length of the multiple reference texts;
and the fourth determining module is used for determining the novelty of the machine-generated text according to the overlapping factor, the repetition penalty factor and the length penalty factor of the machine-generated text.
Optionally, the preset segmentation lengths are multiple, and the first determining module is specifically configured to:
for each preset segmentation length, determining an overlapping factor corresponding to the preset segmentation length according to words corresponding to the preset segmentation length included in the machine-generated text and words corresponding to the preset segmentation length included in the multiple reference texts, wherein the words corresponding to the preset segmentation length are words obtained by segmenting the text according to the preset segmentation length;
and carrying out weighted summation on the overlapping factors corresponding to each preset segmentation length based on the preset weight of each preset segmentation length to obtain the overlapping factors of the text generated by the machine.
Optionally, the number of the machine-generated texts is multiple, and the first determining module is specifically configured to:
counting a first number of words corresponding to each preset segmentation length included in each machine-generated text and a second number of words corresponding to each preset segmentation length included in the plurality of reference texts corresponding to the machine-generated text, aiming at each preset segmentation length;
and determining an overlapping factor corresponding to each preset segmentation length based on the preset parameters and the first number and the second number of the words corresponding to each preset segmentation length.
Optionally, the first determining module is specifically configured to:
aiming at each preset segmentation length, calculating an overlapping factor corresponding to the preset segmentation length according to the following formula:
Figure BDA0002307083570000051
Figure BDA0002307083570000061
wherein n represents a preset segmentation length, candidates represent the multiple machine-generated texts, references represent multiple reference texts of a machine-generated text C, r represents one reference text of the multiple reference texts, n-gram represents a word with a preset segmentation length of n, C represents the machine-generated text C, λ represents the preset parameter, and Count C (n-gram) represents the first number of words, count, corresponding to the preset segmentation length n of the machine-generated text c c-ref (n-gram) represents the number of words corresponding to the preset segmentation length n of the reference text corresponding to the machine-generated text c, delta represents the second number of words corresponding to the preset segmentation length n of the plurality of reference texts corresponding to the machine-generated text c, and P n And representing the overlapping factor corresponding to the preset segmentation length n.
Optionally, the first determining module is specifically configured to:
calculating an overlap factor for the machine-generated text according to the following formula:
Figure BDA0002307083570000062
wherein, P avg An overlap factor, P, representing the machine-generated text n Representing the overlap factor, w, corresponding to a predetermined slicing length, n n And the preset weight of the preset segmentation length N is represented, and the N represents the total number of the preset segmentation lengths.
Optionally, the second determining module is specifically configured to:
determining short sentences contained in the machine-generated text;
and calculating the similarity between short sentences contained in the machine-generated text, and determining a repetition penalty factor of the machine-generated text based on the similarity between the short sentences.
Optionally, the third determining module is specifically configured to:
acquiring the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts;
determining a length penalty factor for the machine-generated text according to the following formula:
Figure BDA0002307083570000071
wherein C represents a machine-generated textC, l of the present C A text length representing the machine-generated text c,
Figure BDA0002307083570000072
an average text length of a plurality of reference texts representing the machine generated text c, <' >>
Figure BDA0002307083570000073
A minimum text length of a plurality of reference texts representing the machine-generated text C, and phi (C) represents a length penalty factor for the machine-generated text C.
Optionally, the fourth determining module is specifically configured to:
and multiplying the overlapping factor, the repeated penalty factor and the length penalty factor in sequence to obtain the novelty of the machine-generated text.
Optionally, the apparatus further comprises: a fifth determination module to:
when the determined novelty of the machine-generated text is greater than a preset novelty threshold, determining the machine-generated text as recommendable machine-generated text.
In order to achieve the above object, an embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the steps of the novelty determination method of any machine-generated text when executing the program stored in the memory.
In order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above method steps.
By applying the method and the device for determining the novelty of the machine-generated text, the machine-generated text and a plurality of reference texts corresponding to the machine-generated text are obtained; determining an overlapping factor of the machine-generated text according to words included in the machine-generated text and words included in the reference texts, wherein the words are words obtained by segmenting the text according to a preset segmentation length; determining a repeated punishment factor of the machine-generated text according to a short sentence included in the machine-generated text, wherein the short sentence is a sentence obtained by segmenting the text according to a preset separator; determining a length penalty factor of the machine-generated text according to the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts; and determining the novelty of the machine-generated text according to the overlapping factor, the repetition penalty factor and the length penalty factor of the machine-generated text. Therefore, factors such as the overlapping degree of the machine-generated text and the reference text, the repetition degree of the machine-generated text, the lengths of the machine-generated text and the reference text and the like are comprehensively considered, and the novelty of the machine-generated text can be more effectively measured.
Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for determining novelty of a machine-generated text according to an embodiment of the present invention;
FIG. 2 is a partial flow diagram of a method for determining novelty of machine-generated text in accordance with an embodiment of the present invention;
FIG. 3 is a schematic flow chart of another part of a method for determining novelty of machine-generated text according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a novelty determining apparatus for machine-generated text according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The criteria for measuring the quality of machine-generated text mainly include the following three aspects: relevancy, language quality and novelty. The relevance represents the relevance degree of the machine-generated text and the reference text, such as the relevance degree of a machine translation result and an expert translation result in a machine translation task; the language quality represents the compliance degree of the machine-generated text in terms of sentence structure and grammar; novelty expresses how distinctive machine-generated text differs from reference text or other machine-generated text.
In order to more accurately determine the novelty of a machine-generated text and further accurately measure the quality of the machine-generated text, the embodiment of the invention provides a method and a device for determining the novelty of the machine-generated text, an electronic device and a computer-readable storage medium.
Referring to fig. 1, fig. 1 is a schematic flowchart of a method for determining novelty of a machine-generated text according to an embodiment of the present invention, where the method includes the following steps:
s101: and acquiring a machine-generated text and a plurality of reference texts corresponding to the machine-generated text.
In an embodiment of the present invention, the machine-generated text may be a text generated in a natural language generation task, for example, a machine-generated text in a machine translation and human-computer conversation scenario.
In order to measure the novelty of the machine-generated text, a corresponding reference text may be acquired, and for one machine-generated text, a plurality of reference texts may be acquired. The reference text can be obtained according to actual requirements, for example, in the field of machine translation, if the machine-generated text to be measured for novelty is the text generated by machine translation, the reference text can be an expert translation text.
S102: determining an overlapping factor of the machine-generated text according to words included in the machine-generated text and words included in the plurality of reference texts, wherein the words are words obtained by segmenting the text according to a preset segmentation length.
In the embodiment of the invention, the overlapping factor of the machine-generated text can be determined according to the words included in the machine-generated text and the words included in the multiple reference texts of the machine-generated text, wherein the words are words obtained by segmenting the text according to the preset segmentation length.
In the embodiment of the invention, the text is segmented word by word according to the preset segmentation length, as an example, if the text is 'utility model patent', and the preset segmentation length is 3, words after segmentation can be 'utility model', 'novel special', 'type patent' respectively.
S103: and determining a repeated punishment factor of the machine-generated text according to a short sentence included in the machine-generated text, wherein the short sentence is a sentence obtained by segmenting the text according to a preset separator.
In the embodiment of the invention, aiming at the machine-generated text, the repetition degree of the text can be calculated. The higher the degree of repetition, the lower the degree of novelty.
Specifically, the repeated penalty factor of the machine-generated text may be determined according to a short sentence included in the machine-generated text, where the short sentence is a sentence obtained by segmenting the text according to a preset delimiter. For example, commas and semicolons can be used as the preset delimiters.
S104: and determining a length penalty factor of the machine-generated text according to the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts.
In the embodiment of the invention, the novelty of the machine-generated text is measured, and the text lengths of the machine-generated text and the reference text can be further considered.
Specifically, a length penalty factor of the machine-generated text is determined according to the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts.
S105: and determining the novelty of the machine-generated text according to the overlapping factor, the repetition penalty factor and the length penalty factor of the machine-generated text.
In the embodiment of the invention, the novelty of the machine-generated text can be determined by integrating the overlapping factor, the repetition penalty factor and the length penalty factor of the machine-generated text.
By applying the method and the device for determining the novelty of the machine-generated text, the machine-generated text and a plurality of reference texts corresponding to the machine-generated text are obtained; determining an overlapping factor of the machine-generated text according to words included in the machine-generated text and words included in the plurality of reference texts, wherein the words are words obtained by segmenting the text according to a preset segmentation length; determining a repeated punishment factor of the machine-generated text according to a short sentence included in the machine-generated text and short sentences included in a plurality of reference texts, wherein the short sentence is a sentence obtained by segmenting the text according to a preset separator; determining a length penalty factor of the machine-generated text according to the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts; and determining the novelty of the machine-generated text according to the overlapping factor, the repetition penalty factor and the length penalty factor of the machine-generated text. Therefore, factors such as the overlapping degree of the machine-generated text and the reference text, the repetition degree of the machine-generated text, the lengths of the machine-generated text and the reference text and the like are comprehensively considered, and the novelty of the machine-generated text can be more effectively measured.
In an embodiment of the present invention, if the preset slicing length may be multiple, referring to fig. 2, the step S102 may specifically include the following steps:
s21: and determining an overlapping factor corresponding to the preset segmentation length according to the word corresponding to the preset segmentation length included in the machine-generated text and the words corresponding to the preset segmentation length included in the plurality of reference texts, wherein the word corresponding to the preset segmentation length is a word obtained by segmenting the text according to the preset segmentation length.
In an embodiment of the present invention, for example, if the preset slicing lengths are 4, which are 1,2,3 and 4, respectively, then a corresponding overlap factor is calculated for each preset slicing length.
Specifically, for a constraint preset segmentation length, the overlap factor corresponding to the preset segmentation length may be determined according to the number of words corresponding to the preset segmentation length included in the machine-generated text and the number of words corresponding to the preset segmentation length included in the plurality of reference texts.
Further, in an embodiment of the present invention, the number of the machine-generated texts to be measured may be multiple, and each machine-generated text corresponds to multiple reference texts.
As an example, the machine-generated text may be represented in the form of a list, and assuming that there are n machine-generated texts to be measured, it may be represented as a = [ a1, a2.. Ai.. An ], (1 ≦ i ≦ n), where ai represents the ith machine-generated text. Assuming that there are m reference texts per machine-generated text, the m reference texts corresponding to ai can be represented as Bi = [ Bi1, bi2.. Bim ].
The step S21 may specifically include the following steps S211 to S212, see fig. 3.
S211: and counting a first number of words corresponding to the preset segmentation length included in each machine-generated text and a second number of words corresponding to the preset segmentation length included in a plurality of reference texts corresponding to the machine-generated text aiming at each preset segmentation length.
In conjunction with the above example, the machine-generated text is a1, a2.. Ai.. An, and the m reference texts corresponding to ai are Bi1, bi2.. Bim.
Then, for each preset segmentation length, a first number of words included in the machine-generated text a1 and corresponding to the preset segmentation length may be counted, and a second number of words included in the m reference texts corresponding to the machine-generated text a1 and corresponding to the preset segmentation length may be counted. In addition, a first number of words corresponding to the preset segmentation length included in the machine-generated text a2 and a second number of words corresponding to the preset segmentation length included in the m reference texts corresponding to the machine-generated text a2 are counted. And analogizing in sequence until a first number of words corresponding to the preset segmentation length included in the machine-generated text an and a second number of words corresponding to the preset segmentation length included in the m reference texts corresponding to the machine-generated text an are counted.
S212: and determining an overlapping factor corresponding to each preset segmentation length based on the preset parameters and the first number and the second number of the words corresponding to each preset segmentation length.
In an embodiment of the present invention, after counting the first number and the second number of words corresponding to each preset segmentation length, the data may be combined to determine the overlap factor corresponding to each preset segmentation length.
In an embodiment of the present invention, for each preset segmentation length, an overlap factor corresponding to the preset segmentation length may be calculated according to the following formula:
Figure BDA0002307083570000121
Figure BDA0002307083570000122
wherein n represents a preset segmentation length, candidates represent the multiple machine-generated texts, references represent multiple reference texts of a machine-generated text C, r represents one reference text of the multiple reference texts, n-gram represents a word with a preset segmentation length of n, C represents the machine-generated text C, λ represents the preset parameter, and Count C (n-gram) representing words corresponding to preset segmentation length n of machine-generated text cFirst number, count c-ref (n-gram) represents the number of words corresponding to the preset segmentation length n of the reference text corresponding to the machine-generated text c, delta represents the second number of words corresponding to the preset segmentation length n of the plurality of reference texts corresponding to the machine-generated text c, and P n And representing the overlapping factor corresponding to the preset segmentation length n.
Wherein n represents a preset segmentation length, candidates represent a plurality of machine-generated texts, references represent a plurality of reference texts, r represents one of the reference texts, n-gram represents a word with the preset segmentation length of n, C represents a machine-generated text C, and lambda represents a preset parameter which can be set according to actual conditions, the value of the parameter can be between 0 and 1, and Count C (n-gram) represents the first number of words, count, corresponding to the preset segmentation length n of the machine-generated text c c-ref (n-gram) represents a second number of words, P, corresponding to a preset segmentation length n of a reference text corresponding to the machine-generated text c n And representing the overlapping factor corresponding to the preset segmentation length n.
S22: and based on the preset weight of each preset segmentation length, carrying out weighted summation on the overlapping factors corresponding to each preset segmentation length to obtain the overlapping factors of the machine-generated text.
After the overlap factor corresponding to each preset segmentation length is determined, the overlap factor corresponding to each preset segmentation length can be weighted and summed based on the preset weight of each preset segmentation length, so that the overlap factor of the whole machine-generated text is obtained.
In one embodiment of the invention, the overlap factor for machine-generated text may be calculated as follows:
Figure BDA0002307083570000131
wherein, P avg An overlap factor, P, representing machine-generated text n Representing the overlap factor, w, corresponding to a predetermined slicing length, n n And the preset weight of the preset segmentation length N is represented, and the N represents the total number of the preset segmentation lengths.
Therefore, in the embodiment of the invention, the overlapping factors corresponding to the preset segmentation lengths are integrated, and the overlapping factor of the whole text generated by the machine is calculated.
In one embodiment of the present invention, step S103: determining a repetition penalty factor of the machine-generated text according to a short sentence included in the machine-generated text, which specifically includes the following refining steps:
determining short sentences contained in the machine-generated text;
and calculating the similarity between short sentences contained in the machine-generated text, and determining a repetition penalty factor of the machine-generated text based on the similarity between the short sentences.
Specifically, the machine-generated text may be divided by using delimiters such as commas and semicolons to obtain a plurality of short sentences, and the similarity between every two short sentences is calculated and averaged to be used as a repeated penalty factor of the machine-generated text.
As an example, if three short sentences a, b, and c are obtained after the machine-generated text is divided, the similarity of the short sentences a and b, the similarity of the short sentences a and c, and the similarity of the short sentences b and c may be calculated respectively, and then the three similarities are averaged to be used as the repetition penalty factor of the machine-generated text.
The process of calculating the similarity between phrases can be referred to in the related art. For example, the calculation may be performed by using an existing BLEU (Bilingual Evaluation understatus) algorithm, which is not described in detail herein.
In one embodiment of the present invention, step S104: determining a repetition penalty factor of the machine-generated text according to a short sentence included in the machine-generated text, which may specifically include the following steps:
acquiring the text length of a machine-generated text, the average text length of a plurality of reference texts and the minimum text length of the plurality of reference texts;
determining a length penalty factor for the machine-generated text according to the following formula:
Figure BDA0002307083570000141
wherein C represents a machine-generated text C, the machine-generated text C has generality and can represent any machine-generated text, and l C A text length representing the machine-generated text c,
Figure BDA0002307083570000142
average text length of a plurality of reference texts representing machine-generated text c, based on the text length of the text in question, and based on the text length of the text in question>
Figure BDA0002307083570000143
A minimum text length of a plurality of reference texts representing machine-generated text C, and phi (C) represents a length penalty factor for machine-generated text C.
In an embodiment of the present invention, step S105 may specifically include: and multiplying the overlapping factor, the repeated penalty factor and the length penalty factor in sequence to obtain the novelty of the machine-generated text.
Therefore, the novelty of the machine-generated text can be measured more effectively by comprehensively considering the overlapping degree of the machine-generated text and the reference text, the repetition degree of the machine-generated text, the lengths of the machine-generated text and the reference text and other factors.
In one embodiment of the invention, when the determined novelty of the machine-generated text is greater than a preset novelty threshold, the machine-generated text may be determined to be recommendable machine-generated text.
Specifically, if the novelty of the determined machine-generated text is greater than a preset novelty threshold, it indicates that the machine-generated text has a high use value, and may be determined as a recommendable machine-generated text, and the machine-generated text is recommended to the user in a specific scenario. For example, in a human-computer interaction scenario, if a certain machine-generated text has a high degree of novelty, it may be recorded so as to be recommended to the user in the corresponding scenario.
Corresponding to the method for determining the novelty of the machine-generated text provided by the embodiment of the present invention, an embodiment of the present invention provides a device for determining the novelty of the machine-generated text, and referring to fig. 4, the method may include the following modules:
an obtaining module 401, configured to obtain a machine-generated text and a plurality of reference texts corresponding to the machine-generated text;
a first determining module 402, configured to determine an overlap factor of the machine-generated text according to a word included in the machine-generated text and a word included in the multiple reference texts, where the word is a word obtained by segmenting a text according to a preset segmentation length;
a second determining module 403, configured to determine a repeated penalty factor of the machine-generated text according to a short sentence included in the machine-generated text, where the short sentence is a sentence obtained by segmenting a text according to a preset delimiter;
a third determining module 404, configured to determine a length penalty factor of the machine-generated text according to the text length of the machine-generated text, the average text length of the multiple reference texts, and the minimum text length of the multiple reference texts;
a fourth determining module 405, configured to determine the novelty of the machine-generated text according to the overlap factor, the repetition penalty factor, and the length penalty factor of the machine-generated text.
In an embodiment of the present invention, the first determining module 402 may be specifically configured to:
for each preset segmentation length, determining an overlapping factor corresponding to the preset segmentation length according to words corresponding to the preset segmentation length included in the machine-generated text and words corresponding to the preset segmentation length included in the multiple reference texts, wherein the words corresponding to the preset segmentation length are words obtained by segmenting the text according to the preset segmentation length;
and carrying out weighted summation on the overlapping factors corresponding to each preset segmentation length based on the preset weight of each preset segmentation length to obtain the overlapping factors of the text generated by the machine.
In an embodiment of the present invention, the machine-generated text includes a plurality of texts, and the first determining module 402 may be specifically configured to:
counting a first number of words corresponding to each preset segmentation length included in each machine-generated text and a second number of words corresponding to each preset segmentation length included in the plurality of reference texts corresponding to the machine-generated text, aiming at each preset segmentation length;
and determining an overlapping factor corresponding to each preset segmentation length based on the preset parameters and the first number and the second number of the words corresponding to each preset segmentation length.
In an embodiment of the present invention, the first determining module 402 may be specifically configured to:
aiming at each preset segmentation length, calculating an overlapping factor corresponding to the preset segmentation length according to the following formula:
Figure BDA0002307083570000161
Figure BDA0002307083570000162
wherein n represents a preset segmentation length, candidates represent the multiple machine-generated texts, references represent multiple reference texts of a machine-generated text C, r represents one reference text of the multiple reference texts, n-gram represents a word with a preset segmentation length of n, C represents the machine-generated text C, λ represents the preset parameter, and Count C (n-gram) represents the first number of words, count, corresponding to the preset segmentation length n of the machine-generated text c c-ref (n-gram) represents the number of words corresponding to the preset segmentation length n of the reference text corresponding to the machine-generated text c, delta represents the second number of words corresponding to the preset segmentation length n of the plurality of reference texts corresponding to the machine-generated text c, and P n And representing the overlapping factor corresponding to the preset segmentation length n.
In an embodiment of the present invention, the first determining module 402 may be specifically configured to:
calculating an overlap factor for the machine-generated text according to the following formula:
Figure BDA0002307083570000163
wherein, P avg An overlap factor, P, representing the machine-generated text n Represents the overlap factor, w, corresponding to the preset segmentation length n n And the preset weight of the preset segmentation length N is represented, and the N represents the total number of the preset segmentation lengths.
In an embodiment of the present invention, the second determining module 403 may specifically be configured to:
determining short sentences contained in the machine-generated text;
and calculating the similarity between short sentences contained in the machine-generated text, and determining a repetition penalty factor of the machine-generated text based on the similarity between the short sentences.
In an embodiment of the present invention, the third determining module 404 may specifically be configured to:
acquiring the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts;
determining a length penalty factor for the machine-generated text according to the following formula:
Figure BDA0002307083570000171
where C represents the machine-generated text C, l C A text length representing the machine-generated text c,
Figure BDA0002307083570000172
an average text length of a plurality of reference texts representing the machine generated text c, <' >>
Figure BDA0002307083570000173
A minimum text length of a plurality of reference texts representing the machine-generated text C, phi (C) representsThe machine generates a length penalty factor for text c.
In an embodiment of the present invention, the fourth determining module 405 may specifically be configured to:
and multiplying the overlapping factor, the repeated penalty factor and the length penalty factor in sequence to obtain the novelty of the machine-generated text.
In an embodiment of the present invention, on the basis of the apparatus shown in fig. 4, a fifth determining module may further be included, where the fifth determining module is configured to:
when the determined novelty of the machine-generated text is greater than a preset novelty threshold, determining the machine-generated text as recommendable machine-generated text.
By applying the novelty determining device for the machine-generated text, the machine-generated text and a plurality of reference texts corresponding to the machine-generated text are obtained; determining an overlapping factor of the machine-generated text according to words included in the machine-generated text and words included in the reference texts, wherein the words are words obtained by segmenting the text according to a preset segmentation length; determining a repeated punishment factor of the machine-generated text according to a short sentence included in the machine-generated text, wherein the short sentence is a sentence obtained by segmenting the text according to a preset separator; determining a length penalty factor of the machine-generated text according to the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts; and determining the novelty of the machine-generated text according to the overlapping factor, the repetition penalty factor and the length penalty factor of the machine-generated text. Therefore, factors such as the overlapping degree of the machine-generated text and the reference text, the repetition degree of the machine-generated text, the lengths of the machine-generated text and the reference text and the like are comprehensively considered, and the novelty of the machine-generated text can be more effectively measured.
Corresponding to the embodiment of the method for determining novelty of machine-generated text, the embodiment of the present invention further provides an electronic device, as shown in fig. 5, including a processor 501, a communication interface 502, a memory 503 and a communication bus 504, where the processor 501, the communication interface 502 and the memory 503 are communicated with each other via the communication bus 504,
a memory 503 for storing a computer program;
the processor 501, when executing the program stored in the memory 503, implements the following steps:
acquiring a machine-generated text and a plurality of reference texts corresponding to the machine-generated text;
determining an overlapping factor of the machine-generated text according to words included in the machine-generated text and words included in the reference texts, wherein the words are words obtained by segmenting the text according to a preset segmentation length;
determining a repeated punishment factor of the machine-generated text according to a short sentence included in the machine-generated text, wherein the short sentence is a sentence obtained by segmenting the text according to a preset separator;
determining a length penalty factor of the machine-generated text according to the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts;
and determining the novelty of the machine-generated text according to the overlapping factor, the repetition penalty factor and the length penalty factor of the machine-generated text.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
The electronic equipment for the machine-generated text provided by the embodiment of the invention is applied to obtain the machine-generated text and a plurality of reference texts corresponding to the machine-generated text; determining an overlapping factor of the machine-generated text according to words included in the machine-generated text and words included in the reference texts, wherein the words are words obtained by segmenting the text according to a preset segmentation length; determining a repeated punishment factor of the machine-generated text according to a short sentence included in the machine-generated text, wherein the short sentence is a sentence obtained by segmenting the text according to a preset separator; determining a length penalty factor of the machine-generated text according to the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts; and determining the novelty of the machine-generated text according to the overlapping factor, the repetition penalty factor and the length penalty factor of the machine-generated text. Therefore, factors such as the overlapping degree of the machine-generated text and the reference text, the repetition degree of the machine-generated text, the lengths of the machine-generated text and the reference text and the like are comprehensively considered, and the novelty of the machine-generated text can be more effectively measured.
An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements any of the above method steps.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the novelty determining apparatus, the electronic device, and the computer-readable storage medium of the machine-generated text, since they are substantially similar to the embodiments of the novelty determining method of the machine-generated text, the description is simple, and for the relevant points, refer to the partial description of the embodiments of the novelty determining method of the machine-generated text.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (8)

1. A method for novelty determination of machine-generated text, the method comprising:
acquiring a machine-generated text and a plurality of reference texts corresponding to the machine-generated text;
determining an overlapping factor of the machine-generated text according to words included in the machine-generated text and words included in the reference texts, wherein the words are words obtained by segmenting the text according to a preset segmentation length;
determining a repeated punishment factor of the machine-generated text according to a short sentence included in the machine-generated text, wherein the short sentence is a sentence obtained by segmenting the text according to a preset separator;
determining a length penalty factor of the machine-generated text according to the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts;
determining the novelty of the machine-generated text according to the overlapping factor, the repetition penalty factor and the length penalty factor of the machine-generated text;
the step of determining a repetition penalty factor of the machine-generated text according to the clause included in the machine-generated text includes:
determining short sentences contained in the machine-generated text;
calculating the similarity between short sentences contained in the machine-generated text, and determining a repeated penalty factor of the machine-generated text based on the similarity between the short sentences;
the step of determining a length penalty factor for the machine-generated text based on the text length of the machine-generated text, the average text length of the plurality of reference texts, and the minimum text length of the plurality of reference texts comprises:
acquiring the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts;
determining a length penalty factor for the machine-generated text according to the following formula:
Figure FDA0003948990220000011
where C represents machine-generated text C, l C A text length representing the machine-generated text c,
Figure FDA0003948990220000021
an average text length of a plurality of reference texts representing the machine generated text c, <' >>
Figure FDA0003948990220000022
A minimum text length of a plurality of reference texts representing the machine-generated text C, and phi (C) represents a length penalty factor for the machine-generated text C.
2. The method of claim 1, wherein the preset segmentation length is multiple, and the step of determining the overlap factor of the machine-generated text according to the words included in the machine-generated text and the words included in the multiple reference texts comprises:
for each preset segmentation length, determining an overlap factor corresponding to the preset segmentation length according to words corresponding to the preset segmentation length included in the machine-generated text and words corresponding to the preset segmentation length included in the multiple reference texts, wherein the words corresponding to the preset segmentation length are words obtained by segmenting the text according to the preset segmentation length;
and carrying out weighted summation on the overlapping factors corresponding to each preset segmentation length based on the preset weight of each preset segmentation length to obtain the overlapping factors of the text generated by the machine.
3. The method of claim 2, wherein the machine-generated text is plural;
the step of determining the overlap factor corresponding to each preset segmentation length according to the word corresponding to the preset segmentation length included in the machine-generated text and the word corresponding to the preset segmentation length included in the plurality of reference texts includes, for each preset segmentation length, includes:
counting a first number of words corresponding to each preset segmentation length included in each machine-generated text and a second number of words corresponding to each preset segmentation length included in the plurality of reference texts corresponding to the machine-generated text, aiming at each preset segmentation length;
and determining an overlapping factor corresponding to each preset segmentation length based on the preset parameters and the first number and the second number of the words corresponding to each preset segmentation length.
4. The method according to claim 3, wherein the step of determining the overlap factor corresponding to each preset segmentation length based on the preset parameters, the first number and the second number of the words corresponding to each preset segmentation length comprises:
aiming at each preset segmentation length, calculating an overlapping factor corresponding to the preset segmentation length according to the following formula:
Figure FDA0003948990220000031
Figure FDA0003948990220000032
wherein n represents a preset segmentation length, candidates represent the multiple machine-generated texts, references represent multiple reference texts of a machine-generated text C, r represents one reference text of the multiple reference texts, n-gram represents a word with a preset segmentation length of n, C represents the machine-generated text C, λ represents the preset parameter, and Count C (n-gram) represents the first number of words, count, corresponding to the preset segmentation length n of the machine-generated text c c-ref (n-gram) represents the number of words corresponding to the preset segmentation length n of the reference text ref corresponding to the machine-generated text c, Δ represents the second number of words corresponding to the preset segmentation length n of the plurality of reference texts corresponding to the machine-generated text c, and P n And representing the overlapping factor corresponding to the preset segmentation length n.
5. The method according to claim 3, wherein the step of performing weighted summation on the overlap factor corresponding to each preset segmentation length based on the preset weight of each preset segmentation length to obtain the overlap factor of the machine-generated text comprises:
calculating an overlap factor for the machine-generated text according to the following formula:
Figure FDA0003948990220000033
wherein, P avg An overlap factor, P, representing the machine-generated text n Representing the overlap factor, w, corresponding to a predetermined slicing length, n n And the preset weight of the preset segmentation length N is represented, and the N represents the total number of the preset segmentation lengths.
6. The method of claim 1, wherein the step of determining the novelty of the machine-generated text in accordance with the overlap factor, repetition penalty factor, and length penalty factor of the machine-generated text comprises:
and multiplying the overlapping factor, the repeated penalty factor and the length penalty factor in sequence to obtain the novelty of the machine-generated text.
7. The method of claim 1, further comprising:
when the determined novelty of the machine-generated text is greater than a preset novelty threshold, determining the machine-generated text as recommendable machine-generated text.
8. An apparatus for machine-generated text novelty determination, the apparatus comprising:
the acquisition module is used for acquiring a machine-generated text and a plurality of reference texts corresponding to the machine-generated text;
the first determining module is used for determining an overlapping factor of the machine-generated text according to words included in the machine-generated text and words included in the reference texts, wherein the words are words obtained by segmenting the text according to a preset segmentation length;
the second determining module is used for determining a repeated punishment factor of the machine-generated text according to a short sentence included in the machine-generated text, wherein the short sentence is a sentence obtained by segmenting the text according to a preset separator;
a third determining module, configured to determine a length penalty factor of the machine-generated text according to a text length of the machine-generated text, an average text length of the multiple reference texts, and a minimum text length of the multiple reference texts;
the fourth determining module is used for determining the novelty of the machine-generated text according to the overlapping factor, the repetition penalty factor and the length penalty factor of the machine-generated text;
the second determining module is specifically configured to:
determining short sentences contained in the machine-generated text;
calculating the similarity between short sentences contained in the machine-generated text, and determining a repeated penalty factor of the machine-generated text based on the similarity between the short sentences;
the third determining module is specifically configured to:
acquiring the text length of the machine-generated text, the average text length of the plurality of reference texts and the minimum text length of the plurality of reference texts;
determining a length penalty factor for the machine-generated text according to the following formula:
Figure FDA0003948990220000041
where C represents machine-generated text C, l C A text length representing the machine-generated text c,
Figure FDA0003948990220000051
a plurality of text c representing the machine generated textAverage text length of the reference text, <' >>
Figure FDA0003948990220000052
A minimum text length of a plurality of reference texts representing the machine-generated text C, and phi (C) represents a length penalty factor for the machine-generated text C. />
CN201911244272.5A 2019-12-06 2019-12-06 Method and device for determining novelty of machine-generated text Active CN111144709B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911244272.5A CN111144709B (en) 2019-12-06 2019-12-06 Method and device for determining novelty of machine-generated text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911244272.5A CN111144709B (en) 2019-12-06 2019-12-06 Method and device for determining novelty of machine-generated text

Publications (2)

Publication Number Publication Date
CN111144709A CN111144709A (en) 2020-05-12
CN111144709B true CN111144709B (en) 2023-04-18

Family

ID=70517799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911244272.5A Active CN111144709B (en) 2019-12-06 2019-12-06 Method and device for determining novelty of machine-generated text

Country Status (1)

Country Link
CN (1) CN111144709B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582933A (en) * 2018-11-13 2019-04-05 北京合享智慧科技有限公司 A kind of method and relevant apparatus of determining text novelty degree
CN109635089A (en) * 2018-12-14 2019-04-16 苏州阳澄湖数字文化创意园投资有限公司 A kind of literary works novelty degree evaluation system and method based on semantic network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7587307B2 (en) * 2003-12-18 2009-09-08 Xerox Corporation Method and apparatus for evaluating machine translation quality

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582933A (en) * 2018-11-13 2019-04-05 北京合享智慧科技有限公司 A kind of method and relevant apparatus of determining text novelty degree
CN109635089A (en) * 2018-12-14 2019-04-16 苏州阳澄湖数字文化创意园投资有限公司 A kind of literary works novelty degree evaluation system and method based on semantic network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ROUGE: A Package for Automatic Evaluation of Summaries;Chin-Yew Lin;《In Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004》;20041231;第1-10页 *
基于综合的句子特征的文本自动摘要;程园 等;《计算机科学》;20150430;第42卷(第04期);第226-229页 *

Also Published As

Publication number Publication date
CN111144709A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
WO2019174423A1 (en) Entity sentiment analysis method and related apparatus
CN109800307B (en) Product evaluation analysis method and device, computer equipment and storage medium
CN113076734B (en) Similarity detection method and device for project texts
CN107102993B (en) User appeal analysis method and device
CN109492217B (en) Word segmentation method based on machine learning and terminal equipment
CN106919551B (en) Emotional word polarity analysis method, device and equipment
Basile et al. Diachronic analysis of the italian language exploiting google ngram
CN112199512B (en) Scientific and technological service-oriented case map construction method, device, equipment and storage medium
CN112287656B (en) Text comparison method, device, equipment and storage medium
CN112818110A (en) Text filtering method, text filtering equipment and computer storage medium
CN111737420A (en) Class case retrieval method, system, device and medium based on dispute focus
CN112835798B (en) Clustering learning method, testing step clustering method and related devices
CN111144709B (en) Method and device for determining novelty of machine-generated text
CN110929509B (en) Domain event trigger word clustering method based on louvain community discovery algorithm
CN116932753A (en) Log classification method, device, computer equipment, storage medium and program product
CN114969334B (en) Abnormal log detection method and device, electronic equipment and readable storage medium
Ananth et al. Grammatical tagging for the Kannada text documents using hybrid bidirectional long-short term memory model
Jesuraj et al. Mblp approach applied to pos tagging in Malayalam language
CN115203556A (en) Score prediction model training method and device, electronic equipment and storage medium
US11562110B1 (en) System and method for device mismatch contribution computation for non-continuous circuit outputs
CN110543634B (en) Corpus data set processing method and device, electronic equipment and storage medium
CN113297854A (en) Method, device and equipment for mapping text to knowledge graph entity and storage medium
Jain et al. A framework for adaptive deep reinforcement semantic parsing of unstructured data
CN111914536B (en) Viewpoint analysis method, viewpoint analysis device, viewpoint analysis equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant