CN112199944A - Method and device for adding watermark in text, electronic equipment and storage medium - Google Patents

Method and device for adding watermark in text, electronic equipment and storage medium Download PDF

Info

Publication number
CN112199944A
CN112199944A CN202011079509.1A CN202011079509A CN112199944A CN 112199944 A CN112199944 A CN 112199944A CN 202011079509 A CN202011079509 A CN 202011079509A CN 112199944 A CN112199944 A CN 112199944A
Authority
CN
China
Prior art keywords
watermark
text
target
participle
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011079509.1A
Other languages
Chinese (zh)
Inventor
刘顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202011079509.1A priority Critical patent/CN112199944A/en
Publication of CN112199944A publication Critical patent/CN112199944A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a method and a device for adding a watermark in a text, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a text to be added with a watermark, performing word segmentation processing, calculating the frequency of each word segmentation, selecting a plurality of target word segmentations from a plurality of word segmentations according to the frequency, inputting the target word segmentations into a pre-trained watermark word vector model, and obtaining a plurality of similarity lists; calculating the similarity between each target participle and each watermark participle in the corresponding similarity list, and taking the watermark participle with the highest similarity as the target watermark participle of the corresponding target participle; and replacing a plurality of target participles in the text to be added with the watermark with corresponding target watermark participles to obtain the watermark text. The invention can be applied to prescription circulation in medical management, and improves the safety and portability in the prescription circulation process by replacing a plurality of target participles in a text to be added with watermarks, such as a prescription, with corresponding target watermark participles.

Description

Method and device for adding watermark in text, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a device for adding a watermark in a text, electronic equipment and a storage medium.
Background
The text digital watermark is a very important technology in the watermark field, and is widely applied to the fields of electronic commerce, digital copyright protection and the like. However, text digital watermarks are easily identified by automatic identification technologies such as optical character identification and format conversion, and after identification, the watermarks are damaged, so that the security of the text is not high.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus, an electronic device, and a storage medium for adding a watermark in a text, where security and portability of a watermark text are improved by replacing a plurality of target participles in the text to be added with corresponding target watermark participles.
A first aspect of the invention provides a method of adding a watermark to text, the method comprising:
acquiring a text to be added with a watermark;
performing word segmentation processing on the text to be added with the watermark to obtain a plurality of words;
calculating the frequency of each participle, and selecting a plurality of target participles from the plurality of participles according to the frequency;
respectively inputting the target participles into a pre-trained watermark word vector model to obtain a plurality of similarity lists, wherein each similarity list comprises at least one watermark participle;
calculating the similarity between each target participle and each watermark participle in the corresponding similarity list, and taking the watermark participle with the highest similarity as the target watermark participle of the corresponding target participle;
and replacing the target word segmentation in the text to be added with the watermark with the corresponding target watermark word segmentation to obtain a watermark text.
Optionally, the training process of the watermark word vector model includes:
acquiring a plurality of text corpora;
performing word segmentation processing on the text corpora to obtain a plurality of segmented words;
establishing a watermark vocabulary table according to the plurality of participles, and carrying out self-coding processing on each participle in the watermark vocabulary table to obtain a self-coding vector of each participle;
calculating the product of the self-coding vector of each participle and a preset input layer weight matrix to obtain a word vector of each participle;
accumulating the sum of the word vectors of each participle in the watermark vocabulary and then averaging to obtain a hidden layer vector;
calculating the product of the hidden layer vector and a preset output layer weight matrix to obtain an output layer vector;
mapping the output layer vector by using an activation function to obtain the probability distribution of each participle;
calculating a loss value between the probability distribution of each participle and a preset label vector by adopting a loss measurement function;
and updating the preset input layer weight matrix and the preset output layer weight matrix by adopting a back propagation algorithm according to the loss value to obtain a watermark word vector model.
Optionally, the selecting a plurality of target participles from the plurality of participles according to the frequency includes:
sorting the frequencies in descending order;
and selecting a plurality of word segmentation ranked at the top from the descending ranking result as a plurality of target word segmentation.
Optionally, the similarity between each target participle and each watermark participle in the corresponding similarity list is calculated by using the following formula:
Figure BDA0002717701190000021
wherein (x)1,x2,…,xn) Word vector (y) which is a target word segmentation1,y2,…,yn) And (3) a word vector which is the watermark word segmentation, n represents the dimension of the word vector, and W is the similarity of each target word segmentation and each watermark word segmentation in the corresponding similarity list.
Optionally, before replacing the target participles in the text to be watermarked with corresponding target watermark participles to obtain a watermarked text, the method further includes:
monitoring the number of target participles in each single sentence in the text to be added with the watermark;
comparing the number of the target word segments in each single sentence with a preset number threshold;
when the number of the target participles in any single sentence is larger than or equal to the preset number threshold, extracting all the target participles in any single sentence;
calculating the similarity between each target word in all the target words and each watermark word in the corresponding similarity list to obtain a plurality of target similarities;
and selecting the watermark participle with the highest target similarity from the target similarities as the final target participle of any single sentence.
Optionally, the performing word segmentation processing on the text to be added with the watermark to obtain a plurality of words comprises:
inputting the text to be added with the watermark into a word segmentation and part-of-speech tagging integrated model for word segmentation processing and part-of-speech tagging to obtain an initial word segmentation set;
and removing stop words in the initial word segmentation set to obtain a plurality of word segments.
Optionally, the performing word segmentation processing on the text to be watermarked to obtain a plurality of words further includes:
acquiring preset word segmentation configuration parameters;
configuring a word segmentation tool according to the word segmentation configuration parameters;
and calling a configured word segmentation tool to perform word segmentation processing and part-of-speech tagging on the text to be added with the watermark to obtain a plurality of words.
A second aspect of the present invention provides an apparatus for adding a watermark to text, the apparatus comprising:
the acquisition module is used for acquiring a text to be added with a watermark;
the word segmentation module is used for carrying out word segmentation processing on the text to be added with the watermark to obtain a plurality of words;
the first calculation module is used for calculating the frequency of each participle and selecting a plurality of target participles from the participles according to the frequency;
the input module is used for respectively inputting the target participles into a pre-trained watermark word vector model to obtain a plurality of similarity lists, wherein each similarity list comprises at least one watermark participle;
the second calculation module is used for calculating the similarity between each target participle and each watermark participle in the corresponding similarity list, and taking the watermark participle with the highest similarity as the target watermark participle of the corresponding target participle;
and the replacing module is used for replacing the target participles in the text to be added with the watermark with the corresponding target watermark participles to obtain the watermark text.
A third aspect of the invention provides an electronic device comprising a processor for implementing any of the methods of watermarking text as described when executing a computer program stored in a memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any one of the methods of watermarking text.
In summary, according to the method, the apparatus, the electronic device and the storage medium for adding the watermark in the text, on one hand, after the target word segmentation is replaced by the watermark word segmentation with the highest similarity, the meaning of the original text of the text to be added with the watermark is not changed, so that the text added with the watermark is consistent with the original text, the structure of the sentence is not changed, the added watermark is completely unknown to the user, and the security of the watermark text is improved.
In addition, because only the word segmentation in the text to be added with the watermark is replaced, if the watermark needs to be changed or transplanted again, only the target word segmentation needs to be added or deleted according to the requirements of the user, and the corresponding watermark word segmentation is updated according to the added or deleted target word segmentation, so that the updating efficiency and the portability of the watermark text are improved.
Drawings
Fig. 1 is a flowchart of a method for adding a watermark to text according to an embodiment of the present invention.
Fig. 2 is a structural diagram of an apparatus for adding a watermark to a text according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Example one
Fig. 1 is a flowchart of a method for adding a watermark to text according to an embodiment of the present invention.
In this embodiment, the method for adding a watermark to a text may be applied to an electronic device, and for an electronic device that needs to add a watermark to a text, the function of adding a watermark to a text provided by the method of the present invention may be directly integrated on the electronic device, or may be run in the electronic device in the form of a Software Development Kit (SKD).
As shown in fig. 1, the method for adding a watermark in a text specifically includes the following steps, and the order of the steps in the flowchart may be changed and some may be omitted according to different requirements.
S11: and acquiring the text to be added with the watermark.
In this embodiment, the text to be watermarked refers to a text that needs to be watermarked, where the text to be watermarked may specifically be a text that is directly input into the server from the outside, and for example, the text to be watermarked that is directly uploaded by the user may be sent to the server; the method can also be used for automatically crawling texts in a plurality of preset data sources by a crawler tool by the server, and taking the texts as texts to be added with watermarks.
S12: and performing word segmentation processing on the text to be added with the watermark to obtain a plurality of words.
In this embodiment, after obtaining the text to be watermarked, a word segmentation tool or a pre-trained model is used to perform word segmentation processing on the text to be watermarked, and meanwhile, part-of-speech tagging is performed on each word segmentation.
Preferably, the performing word segmentation processing on the text to be added with the watermark to obtain a plurality of word segments includes:
inputting the text to be added with the watermark into a word segmentation and part-of-speech tagging integrated model for word segmentation processing and part-of-speech tagging to obtain an initial word segmentation set;
and removing stop words in the initial word segmentation set to obtain a plurality of word segments.
In this embodiment, each participle in the multiple participles corresponds to a part-of-speech tag, and the stop word refers to words such as "of", "bar", etc., which widely appear in each text but have no meaning, and these words need to be removed.
Alternatively, the performing word segmentation processing on the text to be watermarked to obtain a plurality of word segments includes:
acquiring preset word segmentation configuration parameters;
configuring a word segmentation tool according to the word segmentation configuration parameters;
and calling a configured word segmentation tool to perform word segmentation processing and part-of-speech tagging on the text to be added with the watermark to obtain a plurality of words.
In this embodiment, a word segmentation tool may be called in a word segmentation process, and in order to meet a specific scene or a user's usage requirement, a word segmentation tool supporting custom configuration may be further selected, and a preset word segmentation configuration parameter is obtained first, where the preset word segmentation configuration parameter includes a character string to be segmented, a word segmentation mode parameter, and an HMM parameter, and the word segmentation mode includes an accurate mode, a full mode, and a search engine mode, for example, taking a jieba word segmentation tool as an example, a user inputs three parameters into the jieba word segmentation tool in a custom manner, and the three parameters include a character string to be segmented; the cut _ all parameter is used to control whether the full mode is adopted; and the HMM parameter is used for controlling whether an HMM model is used or not, configuring the jieba word segmentation tool, and calling the configured jieba word segmentation tool to perform word segmentation processing and part-of-speech tagging on the text to be added with the watermark to obtain a plurality of words.
S13: and calculating the frequency of each participle, and selecting a plurality of target participles from the plurality of participles according to the frequency.
In this embodiment, before the text to be watermarked is added, a plurality of target participles need to be selected according to the frequency of each participle appearing in the text to be watermarked.
Preferably, the calculating the frequency of each participle comprises:
counting the number of all the participles in the text to be added with the watermark to obtain a first total number;
counting the occurrence times of each word segmentation in the text to be added with the watermark to obtain a second total number;
and calculating the quotient of the second total number divided by the first total number to obtain the frequency of each participle.
Preferably, the selecting a plurality of target participles from the plurality of participles according to the frequency includes:
sorting the frequencies in descending order;
and selecting a plurality of word segmentation ranked at the top from the descending ranking result as a plurality of target word segmentation.
In this embodiment, the frequency of each participle appearing in the watermarked text is calculated, and a plurality of participles with frequent frequency are used as target participles, and the number of the target participles can be set according to the requirements of a user, so that the use rate of the watermarked text is improved.
S14: and respectively inputting the target participles into a pre-trained watermark word vector model to obtain a plurality of similarity lists, wherein each similarity list comprises at least one watermark participle.
In this embodiment, each target word corresponds to one similarity list, and after a plurality of target words are obtained, the plurality of target words are input into a pre-trained watermark word vector model, so that the similarity list corresponding to each target word is obtained.
Specifically, the training process of the watermark word vector model includes:
41) acquiring a plurality of text corpora;
42) performing word segmentation processing on the text corpora to obtain a plurality of segmented words;
43) establishing a watermark vocabulary table according to the plurality of participles, and carrying out self-coding processing on each participle in the watermark vocabulary table to obtain a self-coding vector of each participle;
44) calculating the product of the self-coding vector of each participle and a preset input layer weight matrix to obtain a word vector of each participle;
45) accumulating the sum of the word vectors of each participle in the watermark vocabulary and then averaging to obtain a hidden layer vector;
46) calculating the product of the hidden layer vector and a preset output layer weight matrix to obtain an output layer vector;
47) mapping the output layer vector by using an activation function to obtain the probability distribution of each participle;
48) calculating a loss value between the probability distribution of each participle and a preset label vector by adopting a loss measurement function;
49) and updating the preset input layer weight matrix and the preset output layer weight matrix by adopting a back propagation algorithm according to the loss value to obtain a watermark word vector model.
In this embodiment, a web crawler technology is used to obtain a plurality of text corpora from a plurality of preset data sources, perform word segmentation processing to obtain a plurality of words, then perform self-coding processing on each word to obtain a self-coding vector of each word, may preset an input layer weight matrix, obtain a word vector of each word by calculating a product of the self-coding vector of each word and a preset input layer weight matrix, obtain a hidden layer vector according to the word vector of each word, further calculate an output layer vector, obtain a loss value of each word by calculating an activation function and a loss measurement function, and update the preset input layer weight matrix and the preset output layer weight matrix by using a back propagation algorithm according to the loss value to obtain a watermark word vector model.
In this embodiment, the text corpus is continuously added for training, and the watermark word vector model is continuously updated, so that the accuracy of the similarity list output by the watermark word vector model is improved.
S15: and calculating the similarity between each target word and each watermark word in the corresponding similarity list, and taking the watermark word with the highest similarity as the target watermark word of the corresponding target word.
In this embodiment, the similarity refers to a similarity between each target participle and each watermark participle in the similarity list, and a higher similarity indicates that the target participle is more similar to the watermark participle in part-of-speech, and if the target participle is replaced with the watermark participle, the substantial meaning of the original text is not changed.
Preferably, the similarity between each target participle and each watermark participle in the corresponding similarity list is calculated by adopting the following formula:
Figure BDA0002717701190000081
wherein (x)1,x2,…,xn) Word vector (y) which is a target word segmentation1,y2,…,yn) And a word vector for each watermark word, wherein n represents the dimension of the word vector, and W represents the similarity between each target word and each watermark word in the corresponding similarity list.
In this embodiment, the word vector of each target participle and the word vector of each watermark participle in the similarity list corresponding to the target participle are calculated, and the watermark participle with the highest similarity is selected as the use participle corresponding to the target participle, so that the accuracy of obtaining the watermark participle is improved.
S16: and replacing the target word segmentation in the text to be added with the watermark with the corresponding target watermark word segmentation to obtain a watermark text.
Illustratively, the text to be watermarked is: "one person lies on bed. He likes to go to the grassland and the leaves of the big tree at the doorway are connected together and the sunlight is blocked. "segmenting the text to be added with the watermark to obtain an initial segmentation set: { one person, in bed, lying down. He, like, go, grassland, play, mouth, big tree, leaf, city, company, in, together, shelter, sun. Removing stop words in the initial word segmentation set to obtain a plurality of word segmentations: { one person, in bed, lying down. He, likes, goes, grassland, plays, leaves, joins, stays, blocks, sunlight. Calculating that the probability of each participle appearing in the participles is the same, and randomly selecting N participles as target participles: for example: on bed, grassland, leaves and sunlight, the N is determined according to the importance degree of the text to be added with the watermark.
Segmenting the target word: inputting the data of bed, grassland, leaves and sunlight into a pre-trained watermark word vector model to obtain a plurality of similarity lists: inputting the similar words on the bed, wherein the obtained similar word list is empty; inputting a prairie, and obtaining a similar word list as follows: calculating the similarity between the grassland and the big grassland as follows: 0.94 of the total weight of the mixture; inputting leaves, and obtaining a similar word list as follows: green leaves and red leaves, and calculating the similarity between the leaves and the green leaves as follows: 0.8, the similarity of the leaves and the red leaves is as follows: 0.2; inputting sunlight, and obtaining a similar word list as follows: and sunlight, and calculating the similarity between the sunlight and the sunlight as follows: 0.91, determining that the corresponding watermark word segmentation on the bed is absent, the watermark word segmentation corresponding to the grassland is big grassland, the watermark word segmentation corresponding to the leaves is green leaves, and the watermark word segmentation sunlight corresponding to the sunlight, wherein the replaced watermark text is as follows: "one person lies on bed. He likes to go to a big grassland to play, and leaves of big trees in the doorway are connected together to block the sunlight. ".
In the embodiment, after the target word segmentation is replaced by the watermark word segmentation with the highest similarity, the meaning of the original text of the text to be added with the watermark is not changed, so that the text added with the watermark is consistent with the original text, the structure of a sentence is not changed, the added watermark is completely unaware of a user, the security of the watermark text is improved, and illicit molecules generally do not perceive that the read text is the watermark text, so that malicious damage cannot be removed, and the probability of maliciously damaging the watermark text is reduced.
In addition, because only the word segmentation in the text to be added with the watermark is replaced, if the watermark needs to be changed or transplanted again, only the target word segmentation needs to be added or deleted according to the requirements of the user, and the corresponding watermark word segmentation is updated according to the added or deleted target word segmentation, so that the updating efficiency and the portability of the watermark text are improved.
Alternatively, before replacing the target participles in the text to be watermarked with the corresponding target watermark participles to obtain a watermarked text, the method further includes:
monitoring the number of target participles in each single sentence in the text to be added with the watermark;
comparing the number of the target word segments in each single sentence with a preset number threshold;
when the number of the target participles in any single sentence is larger than or equal to the preset number threshold, extracting all the target participles in any single sentence;
calculating the similarity between each target word in all the target words and each watermark word in the corresponding similarity list to obtain a plurality of target similarities;
and selecting the watermark participle with the highest target similarity from the target similarities as the final target participle of any single sentence.
Further, the method further comprises:
and when the number of the target participles in any single sentence in the text to be added with the watermark is monitored to be smaller than the preset number threshold, replacing the target participles in any single sentence with the corresponding target watermark participles.
In this embodiment, the number of target participles appearing in each single sentence may be preset, when the number of target participles in any single sentence is greater than or equal to the preset number threshold, all target participles in any single sentence are extracted and the target participle with the highest target similarity corresponding to all target participles is selected as the final target participle of any single sentence, and each single sentence only replaces one target participle, so that the data processing amount is reduced, and the efficiency of adding a watermark in a text to be processed is improved.
In other embodiments, each sentence may include a plurality of target phrases to be replaced at the same time, and the invention is not limited thereto.
In summary, in the method for adding a watermark in a text described in this embodiment, on one hand, after replacing the target participle with the watermark participle with the highest similarity, the meaning of the original text of the text to be added with the watermark is not changed, the text added with the watermark is consistent with the original text, the structure of the sentence is not changed, the added watermark is completely unaware to the user, the security of the watermark text is improved, illicit molecules generally cannot realize that the read text is the watermark text, so that the malicious damage cannot be removed, the probability of the malicious damage to the watermark text is reduced, on the other hand, because the watermark word vector model is trained by continuously increasing text corpora and is continuously updated, the target participles are input into the watermark word vector model which is trained in advance, and a plurality of similarity lists are obtained, so that the accuracy of the similarity list output by the watermark word vector model is improved. In addition, because only the word segmentation in the text to be added with the watermark is replaced, if the watermark needs to be changed or transplanted again, only the target word segmentation needs to be added or deleted according to the requirements of the user, and the corresponding watermark word segmentation is updated according to the added or deleted target word segmentation, so that the updating efficiency and the portability of the watermark text are improved. It should be noted that the invention can also be applied to prescription circulation in medical management, and the safety and portability in the prescription circulation process are improved by replacing a text to be added with watermarks, such as a plurality of target participles in a prescription, with corresponding target watermark participles.
Example two
Fig. 2 is a structural diagram of an apparatus for adding a watermark to a text according to a second embodiment of the present invention.
In some embodiments, the means 20 for watermarking the text may comprise a plurality of functional modules consisting of program code segments. The program code of the various program segments of the apparatus 20 for watermarking text may be stored in a memory of an electronic device and executed by the at least one processor to perform (see fig. 1 for details) the function of watermarking text.
In this embodiment, the apparatus 20 for adding watermark to text may be divided into a plurality of functional modules according to the functions performed by the apparatus. The functional module may include: the system comprises an acquisition module 201, a word segmentation module 202, a first calculation module 203, an input module 204, a second calculation module 205 and a replacement module 206. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.
The acquisition module 201: for obtaining the text to be watermarked.
In this embodiment, the text to be watermarked refers to a text that needs to be watermarked, where the text to be watermarked may specifically be a text that is directly input into the server from the outside, and for example, the text to be watermarked that is directly uploaded by the user may be sent to the server; the method can also be used for automatically crawling texts in a plurality of preset data sources by a crawler tool by the server, and taking the texts as texts to be added with watermarks.
The word segmentation module 202: and the method is used for performing word segmentation processing on the text to be added with the watermark to obtain a plurality of words.
In this embodiment, after obtaining the text to be watermarked, a word segmentation tool or a pre-trained model is used to perform word segmentation processing on the text to be watermarked, and meanwhile, part-of-speech tagging is performed on each word segmentation.
Preferably, the word segmentation module 202 performs word segmentation on the text to be watermarked, and obtaining a plurality of words includes:
inputting the text to be added with the watermark into a word segmentation and part-of-speech tagging integrated model for word segmentation processing and part-of-speech tagging to obtain an initial word segmentation set;
and removing stop words in the initial word segmentation set to obtain a plurality of word segments.
In this embodiment, each participle in the multiple participles corresponds to a part-of-speech tag, and the stop word refers to words such as "of", "bar", etc., which widely appear in each text but have no meaning, and these words need to be removed.
Alternatively, the word segmentation module 202 performs word segmentation on the text to be watermarked, and obtaining a plurality of words includes:
acquiring preset word segmentation configuration parameters;
configuring a word segmentation tool according to the word segmentation configuration parameters;
and calling a configured word segmentation tool to perform word segmentation processing and part-of-speech tagging on the text to be added with the watermark to obtain a plurality of words.
In this embodiment, a word segmentation tool may be called in a word segmentation process, and in order to meet a specific scene or a user's usage requirement, a word segmentation tool supporting custom configuration may be further selected, and a preset word segmentation configuration parameter is obtained first, where the preset word segmentation configuration parameter includes a character string to be segmented, a word segmentation mode parameter, and an HMM parameter, and the word segmentation mode includes an accurate mode, a full mode, and a search engine mode, for example, taking a jieba word segmentation tool as an example, a user inputs three parameters into the jieba word segmentation tool in a custom manner, and the three parameters include a character string to be segmented; the cut _ all parameter is used to control whether the full mode is adopted; and the HMM parameter is used for controlling whether an HMM model is used or not, configuring the jieba word segmentation tool, and calling the configured jieba word segmentation tool to perform word segmentation processing and part-of-speech tagging on the text to be added with the watermark to obtain a plurality of words.
The first calculation module 203: and the system is used for calculating the frequency of each participle and selecting a plurality of target participles from the participles according to the frequency.
In this embodiment, before the text to be watermarked is added, a plurality of target participles need to be selected according to the frequency of each participle appearing in the text to be watermarked.
Preferably, the calculating the frequency of each participle by the first calculating module 203 comprises:
counting the number of all the participles in the text to be added with the watermark to obtain a first total number;
counting the occurrence times of each word segmentation in the text to be added with the watermark to obtain a second total number;
and calculating the quotient of the second total number divided by the first total number to obtain the frequency of each participle.
Preferably, the selecting, by the first calculating module 203, a plurality of target participles from the plurality of participles according to the frequency includes:
sorting the frequencies in descending order;
and selecting a plurality of word segmentation ranked at the top from the descending ranking result as a plurality of target word segmentation.
In this embodiment, the frequency of each participle appearing in the watermarked text is calculated, and a plurality of participles with frequent frequency are used as target participles, and the number of the target participles can be set according to the requirements of a user, so that the use rate of the watermarked text is improved.
The input module 204: and the similarity degree lists are used for respectively inputting the target participles into a pre-trained watermark word vector model to obtain a plurality of similarity degree lists, wherein each similarity degree list comprises at least one watermark participle.
In this embodiment, each target word corresponds to one similarity list, and after a plurality of target words are obtained, the plurality of target words are input into a pre-trained watermark word vector model, so that the similarity list corresponding to each target word is obtained.
Specifically, the training process of the watermark word vector model includes:
41) acquiring a plurality of text corpora;
42) performing word segmentation processing on the text corpora to obtain a plurality of segmented words;
43) establishing a watermark vocabulary table according to the plurality of participles, and carrying out self-coding processing on each participle in the watermark vocabulary table to obtain a self-coding vector of each participle;
44) calculating the product of the self-coding vector of each participle and a preset input layer weight matrix to obtain a word vector of each participle;
45) accumulating the sum of the word vectors of each participle in the watermark vocabulary and then averaging to obtain a hidden layer vector;
46) calculating the product of the hidden layer vector and a preset output layer weight matrix to obtain an output layer vector;
47) mapping the output layer vector by using an activation function to obtain the probability distribution of each participle;
48) calculating a loss value between the probability distribution of each participle and a preset label vector by adopting a loss measurement function;
49) and updating the preset input layer weight matrix and the preset output layer weight matrix by adopting a back propagation algorithm according to the loss value to obtain a watermark word vector model.
In this embodiment, a web crawler technology is used to obtain a plurality of text corpora from a plurality of preset data sources, perform word segmentation processing to obtain a plurality of words, then perform self-coding processing on each word to obtain a self-coding vector of each word, may preset an input layer weight matrix, obtain a word vector of each word by calculating a product of the self-coding vector of each word and a preset input layer weight matrix, obtain a hidden layer vector according to the word vector of each word, further calculate an output layer vector, obtain a loss value of each word by calculating an activation function and a loss measurement function, and update the preset input layer weight matrix and the preset output layer weight matrix by using a back propagation algorithm according to the loss value to obtain a watermark word vector model.
In this embodiment, the text corpus is continuously added for training, and the watermark word vector model is continuously updated, so that the accuracy of the similarity list output by the watermark word vector model is improved.
The second calculation module 205: and the similarity calculation module is used for calculating the similarity between each target word and each watermark word in the corresponding similarity list, and taking the watermark word with the highest similarity as the target watermark word of the corresponding target word.
In this embodiment, the similarity refers to a similarity between each target participle and each watermark participle in the similarity list, and a higher similarity indicates that the target participle is more similar to the watermark participle in part-of-speech, and if the target participle is replaced with the watermark participle, the substantial meaning of the original text is not changed.
Preferably, the second calculating module 205 calculates the similarity between each target participle and each watermark participle in the corresponding similarity list by using the following formula:
Figure BDA0002717701190000151
wherein (x)1,x2,…,xn) Word vector (y) which is a target word segmentation1,y2,…,yn) And a word vector for each watermark word, wherein n represents the dimension of the word vector, and W represents the similarity between each target word and each watermark word in the corresponding similarity list.
In this embodiment, the word vector of each target participle and the word vector of each watermark participle in the similarity list corresponding to the target participle are calculated, and the watermark participle with the highest similarity is selected as the use participle corresponding to the target participle, so that the accuracy of obtaining the watermark participle is improved.
The replacement module 206: and the watermark text is obtained by replacing the target word segments in the text to be added with the watermark with the corresponding target watermark word segments.
Illustratively, the text to be watermarked is: "one person lies on bed. He likes to go to the grassland and the leaves of the big tree at the doorway are connected together and the sunlight is blocked. "segmenting the text to be added with the watermark to obtain an initial segmentation set: { one person, in bed, lying down. He, like, go, grassland, play, mouth, big tree, leaf, city, company, in, together, shelter, sun. Removing stop words in the initial word segmentation set to obtain a plurality of word segmentations: { one person, in bed, lying down. He, likes, goes, grassland, plays, leaves, joins, stays, blocks, sunlight. Calculating that the probability of each participle appearing in the participles is the same, and randomly selecting N participles as target participles: for example: on bed, grassland, leaves and sunlight, the N is determined according to the importance degree of the text to be added with the watermark.
Segmenting the target word: inputting the data of bed, grassland, leaves and sunlight into a pre-trained watermark word vector model to obtain a plurality of similarity lists: inputting the similar words on the bed, wherein the obtained similar word list is empty; inputting a prairie, and obtaining a similar word list as follows: calculating the similarity between the grassland and the big grassland as follows: 0.94 of the total weight of the mixture; inputting leaves, and obtaining a similar word list as follows: green leaves and red leaves, and calculating the similarity between the leaves and the green leaves as follows: 0.8, the similarity of the leaves and the red leaves is as follows: 0.2; inputting sunlight, and obtaining a similar word list as follows: and sunlight, and calculating the similarity between the sunlight and the sunlight as follows: 0.91, determining that the corresponding watermark word segmentation on the bed is absent, the watermark word segmentation corresponding to the grassland is big grassland, the watermark word segmentation corresponding to the leaves is green leaves, and the watermark word segmentation sunlight corresponding to the sunlight, wherein the replaced watermark text is as follows: "one person lies on bed. He likes to go to a big grassland to play, and leaves of big trees in the doorway are connected together to block the sunlight. ".
In the embodiment, after the target word segmentation is replaced by the watermark word segmentation with the highest similarity, the meaning of the original text of the text to be added with the watermark is not changed, so that the text added with the watermark is consistent with the original text, the structure of a sentence is not changed, the added watermark is completely unaware of a user, the security of the watermark text is improved, and illicit molecules generally do not perceive that the read text is the watermark text, so that malicious damage cannot be removed, and the probability of maliciously damaging the watermark text is reduced.
In addition, because only the word segmentation in the text to be added with the watermark is replaced, if the watermark needs to be changed or transplanted again, only the target word segmentation needs to be added or deleted according to the requirements of the user, and the corresponding watermark word segmentation is updated according to the added or deleted target word segmentation, so that the updating efficiency and the portability of the watermark text are improved.
Alternatively, before the replacing module 206 replaces the target participles in the text to be watermarked with the corresponding target watermark participles to obtain a watermarked text, the method further includes:
monitoring the number of target participles in each single sentence in the text to be added with the watermark;
comparing the number of the target word segments in each single sentence with a preset number threshold;
when the number of the target participles in any single sentence is larger than or equal to the preset number threshold, extracting all the target participles in any single sentence;
calculating the similarity between each target word in all the target words and each watermark word in the corresponding similarity list to obtain a plurality of target similarities;
and selecting the watermark participle with the highest target similarity from the target similarities as the final target participle of any single sentence.
Further, the replacement module 206: and when the number of the target participles in any single sentence in the text to be added with the watermark is monitored to be smaller than the preset number threshold, replacing the target participles in any single sentence with the corresponding target watermark participles.
In this embodiment, the number of target participles appearing in each single sentence may be preset, when the number of target participles in any single sentence is greater than or equal to the preset number threshold, all target participles in any single sentence are extracted and the target participle with the highest target similarity corresponding to all target participles is selected as the final target participle of any single sentence, and each single sentence only replaces one target participle, so that the data processing amount is reduced, and the efficiency of adding a watermark in a text to be processed is improved.
In other embodiments, each sentence may include a plurality of target phrases to be replaced at the same time, and the invention is not limited thereto.
In summary, in the method for adding a watermark in a text described in this embodiment, on one hand, after replacing the target participle with the watermark participle with the highest similarity, the meaning of the original text of the text to be added with the watermark is not changed, the text added with the watermark is consistent with the original text, the structure of the sentence is not changed, the added watermark is completely unaware to the user, the security of the watermark text is improved, illicit molecules generally cannot realize that the read text is the watermark text, so that the malicious damage cannot be removed, the probability of the malicious damage to the watermark text is reduced, on the other hand, because the watermark word vector model is trained by continuously increasing text corpora and is continuously updated, the target participles are input into the watermark word vector model which is trained in advance, and a plurality of similarity lists are obtained, so that the accuracy of the similarity list output by the watermark word vector model is improved.
In addition, because only the word segmentation in the text to be added with the watermark is replaced, if the watermark needs to be changed or transplanted again, only the target word segmentation needs to be added or deleted according to the requirements of the user, and the corresponding watermark word segmentation is updated according to the added or deleted target word segmentation, so that the updating efficiency and the portability of the watermark text are improved.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the electronic device 3 comprises a memory 31, at least one processor 32, at least one communication bus 33 and a transceiver 34.
It will be appreciated by those skilled in the art that the configuration of the electronic device shown in fig. 3 does not constitute a limitation of the embodiment of the present invention, and may be a bus-type configuration or a star-type configuration, and the electronic device 3 may include more or less other hardware or software than those shown, or a different arrangement of components.
In some embodiments, the electronic device 3 is an electronic device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The electronic device 3 may also include a client device, which includes, but is not limited to, any electronic product that can interact with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, and the like.
It should be noted that the electronic device 3 is only an example, and other existing or future electronic products, such as those that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.
In some embodiments, the memory 31 is used for storing program codes and various data, such as the device 20 for adding watermark to text installed in the electronic device 3, and realizes high-speed and automatic access to programs or data during the operation of the electronic device 3. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.
In some embodiments, the at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The at least one processor 32 is a Control Unit (Control Unit) of the electronic device 3, connects various components of the electronic device 3 by using various interfaces and lines, and executes various functions and processes data of the electronic device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31.
In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.
Although not shown, the electronic device 3 may further include a power supply (such as a battery) for supplying power to each component, and optionally, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, an electronic device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.
In a further embodiment, in conjunction with fig. 2, the at least one processor 32 may execute operating means of the electronic device 3 and various installed applications (such as the apparatus for watermarking text 20), program code, and the like, such as the various modules described above.
The memory 31 has program code stored therein, and the at least one processor 32 can call the program code stored in the memory 31 to perform related functions. For example, the modules illustrated in fig. 2 are program code stored in the memory 31 and executed by the at least one processor 32, so as to implement the functions of the modules for the purpose of adding watermarks in text.
In one embodiment of the invention, the memory 31 stores a plurality of instructions that are executed by the at least one processor 32 to implement the functionality of watermarking text.
Specifically, the at least one processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, and details are not repeated here.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for adding a watermark to text, the method for adding a watermark to text comprising:
acquiring a text to be added with a watermark;
performing word segmentation processing on the text to be added with the watermark to obtain a plurality of words;
calculating the frequency of each participle, and selecting a plurality of target participles from the plurality of participles according to the frequency;
respectively inputting the target participles into a pre-trained watermark word vector model to obtain a plurality of similarity lists, wherein each similarity list comprises at least one watermark participle;
calculating the similarity between each target participle and each watermark participle in the corresponding similarity list, and taking the watermark participle with the highest similarity as the target watermark participle of the corresponding target participle;
and replacing the target word segmentation in the text to be added with the watermark with the corresponding target watermark word segmentation to obtain a watermark text.
2. The method for adding watermark to text according to claim 1, wherein the training process of the watermark word vector model comprises:
acquiring a plurality of text corpora;
performing word segmentation processing on the text corpora to obtain a plurality of segmented words;
establishing a watermark vocabulary table according to the plurality of participles, and carrying out self-coding processing on each participle in the watermark vocabulary table to obtain a self-coding vector of each participle;
calculating the product of the self-coding vector of each participle and a preset input layer weight matrix to obtain a word vector of each participle;
accumulating the sum of the word vectors of each participle in the watermark vocabulary and then averaging to obtain a hidden layer vector;
calculating the product of the hidden layer vector and a preset output layer weight matrix to obtain an output layer vector;
mapping the output layer vector by using an activation function to obtain the probability distribution of each participle;
calculating a loss value between the probability distribution of each participle and a preset label vector by adopting a loss measurement function;
and updating the preset input layer weight matrix and the preset output layer weight matrix by adopting a back propagation algorithm according to the loss value to obtain a watermark word vector model.
3. The method of watermarking text according to claim 1, wherein the selecting a plurality of target participles from the plurality of participles according to the frequency comprises:
sorting the frequencies in descending order;
and selecting a plurality of word segmentation ranked at the top from the descending ranking result as a plurality of target word segmentation.
4. The method for adding watermark to text according to claim 1, wherein the similarity between each target participle and each watermark participle in the corresponding similarity list is calculated by the following formula:
Figure FDA0002717701180000021
wherein (x)1,x2,…,xn) Is divided into targetWord vector of word, (y)1,y2,…,yn) And (3) a word vector which is the watermark word segmentation, n represents the dimension of the word vector, and W is the similarity of each target word segmentation and each watermark word segmentation in the corresponding similarity list.
5. The method for adding watermark to text according to claim 1, wherein before replacing the target participles in the text to be added with corresponding target watermark participles to obtain watermarked text, the method further comprises:
monitoring the number of target participles in each single sentence in the text to be added with the watermark;
comparing the number of the target word segments in each single sentence with a preset number threshold;
when the number of the target participles in any single sentence is larger than or equal to the preset number threshold, extracting all the target participles in any single sentence;
calculating the similarity between each target word in all the target words and each watermark word in the corresponding similarity list to obtain a plurality of target similarities;
and selecting the watermark participle with the highest target similarity from the target similarities as the final target participle of any single sentence.
6. The method for adding watermark to text according to claim 1, wherein the word segmentation processing on the text to be added with watermark to obtain a plurality of word segments comprises:
inputting the text to be added with the watermark into a word segmentation and part-of-speech tagging integrated model for word segmentation processing and part-of-speech tagging to obtain an initial word segmentation set;
and removing stop words in the initial word segmentation set to obtain a plurality of word segments.
7. The method for adding watermark to text according to claim 1, wherein the word segmentation processing on the text to be added with watermark to obtain a plurality of word segments further comprises:
acquiring preset word segmentation configuration parameters;
configuring a word segmentation tool according to the word segmentation configuration parameters;
and calling a configured word segmentation tool to perform word segmentation processing and part-of-speech tagging on the text to be added with the watermark to obtain a plurality of words.
8. An apparatus for watermarking text, the apparatus comprising:
the acquisition module is used for acquiring a text to be added with a watermark;
the word segmentation module is used for carrying out word segmentation processing on the text to be added with the watermark to obtain a plurality of words;
the first calculation module is used for calculating the frequency of each participle and selecting a plurality of target participles from the participles according to the frequency;
the input module is used for respectively inputting the target participles into a pre-trained watermark word vector model to obtain a plurality of similarity lists, wherein each similarity list comprises at least one watermark participle;
the second calculation module is used for calculating the similarity between each target participle and each watermark participle in the corresponding similarity list, and taking the watermark participle with the highest similarity as the target watermark participle of the corresponding target participle;
and the replacing module is used for replacing the target participles in the text to be added with the watermark with the corresponding target watermark participles to obtain the watermark text.
9. An electronic device, characterized in that the electronic device comprises a processor for implementing the method of watermarking text as claimed in any one of claims 1 to 7 when executing a computer program stored in a memory.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of watermarking text according to any one of claims 1 to 7.
CN202011079509.1A 2020-10-10 2020-10-10 Method and device for adding watermark in text, electronic equipment and storage medium Pending CN112199944A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011079509.1A CN112199944A (en) 2020-10-10 2020-10-10 Method and device for adding watermark in text, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011079509.1A CN112199944A (en) 2020-10-10 2020-10-10 Method and device for adding watermark in text, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112199944A true CN112199944A (en) 2021-01-08

Family

ID=74013330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011079509.1A Pending CN112199944A (en) 2020-10-10 2020-10-10 Method and device for adding watermark in text, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112199944A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468486A (en) * 2021-05-07 2021-10-01 北京东方通科技股份有限公司 Big data watermarking method based on artificial intelligence

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468486A (en) * 2021-05-07 2021-10-01 北京东方通科技股份有限公司 Big data watermarking method based on artificial intelligence
CN113468486B (en) * 2021-05-07 2023-12-08 北京东方通科技股份有限公司 Big data watermarking method based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN108629043A (en) Extracting method, device and the storage medium of webpage target information
CN111984793A (en) Text emotion classification model training method and device, computer equipment and medium
CN112541338A (en) Similar text matching method and device, electronic equipment and computer storage medium
CN112885478B (en) Medical document retrieval method, medical document retrieval device, electronic device and storage medium
CN112614578A (en) Doctor intelligent recommendation method and device, electronic equipment and storage medium
CN113706322A (en) Service distribution method, device, equipment and storage medium based on data analysis
CN113722483A (en) Topic classification method, device, equipment and storage medium
CN113435582A (en) Text processing method based on sentence vector pre-training model and related equipment
CN113627797A (en) Image generation method and device for employee enrollment, computer equipment and storage medium
CN116956896A (en) Text analysis method, system, electronic equipment and medium based on artificial intelligence
CN113627160B (en) Text error correction method and device, electronic equipment and storage medium
CN113420542B (en) Dialogue generation method, device, electronic equipment and storage medium
CN112860851B (en) Course recommendation method, device, equipment and medium based on root cause analysis
CN112199944A (en) Method and device for adding watermark in text, electronic equipment and storage medium
CN114020892A (en) Answer selection method and device based on artificial intelligence, electronic equipment and medium
CN114708073B (en) Intelligent detection method and device for surrounding mark and serial mark, electronic equipment and storage medium
CN111651452A (en) Data storage method and device, computer equipment and storage medium
CN116719904A (en) Information query method, device, equipment and storage medium based on image-text combination
CN111161861A (en) Short text data processing method and device for hospital logistics operation and maintenance
CN115658858A (en) Dialog recommendation method based on artificial intelligence and related equipment
CN115221323A (en) Cold start processing method, device, equipment and medium based on intention recognition model
CN114862140A (en) Behavior analysis-based potential evaluation method, device, equipment and storage medium
CN114996400A (en) Referee document processing method and device, electronic equipment and storage medium
CN114219367A (en) User scoring method, device, equipment and storage medium
CN114492446A (en) Legal document processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination