CN111178056A - Deep learning based file generation method and device and electronic equipment - Google Patents

Deep learning based file generation method and device and electronic equipment Download PDF

Info

Publication number
CN111178056A
CN111178056A CN202010001994.4A CN202010001994A CN111178056A CN 111178056 A CN111178056 A CN 111178056A CN 202010001994 A CN202010001994 A CN 202010001994A CN 111178056 A CN111178056 A CN 111178056A
Authority
CN
China
Prior art keywords
word segmentation
word
file
input
recommended
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010001994.4A
Other languages
Chinese (zh)
Inventor
赵晖
沈艺
齐康
侯干
张兵兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Cloud Computing Co Ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN202010001994.4A priority Critical patent/CN111178056A/en
Publication of CN111178056A publication Critical patent/CN111178056A/en
Priority to CA3166742A priority patent/CA3166742A1/en
Priority to PCT/CN2020/111951 priority patent/WO2021135319A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Abstract

The invention provides a deep learning-based file generation method, a deep learning-based file generation device and electronic equipment, wherein the method comprises the following steps: acquiring a user input title; performing word segmentation operation on the title by adopting a jieba Chinese word segmentation method; extracting keywords from the titles subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set; and conveying the keyword set as input to a preset text generation algorithm model, and taking the obtained output as a file. By adopting the method, the device extracts key words which are mainly described in the file as the input of the model by processing the file data, constructs new training data, and ensures that the input and the output have strong corresponding relation, thereby improving the correlation between the file generated by the model and the input and greatly improving the quality of the generated file.

Description

Deep learning based file generation method and device and electronic equipment
Technical Field
The invention belongs to the technical field of deep neural network natural languages, and particularly relates to a file generation method and device based on deep learning and electronic equipment.
Background
At present, new products of the e-commerce industry need to use a file during market promotion so as to provide a recommendation reason for consumers, and a better file needs to be designed to highlight product selling points. In the field, the traditional method trains the model by using the title as input and the recommended reason scheme as output, the generated commodity recommended reason scheme and the manually written scheme have large quality difference, the large-scale application of automatic scheme generation is hindered, the specific defects are that the scheme generated by the traditional method cannot accurately highlight the selling point of the commodity, and the corresponding relation between the generated scheme and the input title is not strong.
Disclosure of Invention
One of the objectives of the present application is to provide a method for generating a document based on deep learning to improve the quality of the generated document, aiming at the disadvantages of the prior art, the method includes the steps of:
acquiring a user input title;
performing word segmentation operation on the title by adopting a jieba Chinese word segmentation method;
extracting keywords from the titles subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set;
and conveying the keyword set as input to a preset text generation algorithm model, and taking the obtained output as a file.
Preferably, the method for generating the preset text generation algorithm model includes the steps of:
acquiring a plurality of manually written recommended documents;
performing word segmentation operation on the recommended case by adopting a jieba Chinese word segmentation method;
extracting keywords of the recommended case subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set;
and taking the keyword set as input, taking the recommended file as output, and training to obtain the text generation algorithm model.
Preferably, the jieba chinese word segmentation method includes the steps of:
acquiring an input statement;
establishing a word segmentation DAG word graph based on the Trie tree word segmentation model;
calculating global probability Route to obtain word frequency maximum segmentation combination based on prefix dictionary;
judging whether the word frequency maximum segmentation combination is a login word or not; if the word is judged to be the login word, the login word is labeled according to the dictionary identification and output; if the Chinese characters are not the login words, separately processing the Chinese characters and the non-Chinese characters by using Token identification;
if the character is judged to be Chinese, loading a hidden horse HMM probability model graph, obtaining word segmentation and label by using a Viterbi algorithm dynamic rule, and then outputting;
if the judgment result is non-Chinese, identifying the combination of English, number and time forms, giving corresponding labels, and outputting.
Preferably, before the building of the participle DAG word graph based on the Trie tree participle model, the method further comprises the following steps:
loading the log-in word dictionary;
and establishing the Trie tree word segmentation model.
Preferably, the step of establishing a participle DAG word graph based on the Trie tree participle model and the step of obtaining the input sentence further comprises the following steps:
cleaning the sentence and judging whether the sentence contains special characters or not;
and if the special characters are judged to be contained, separating the special characters, identifying the special characters as unknown parts of speech and then outputting the unknown parts of speech.
A second objective of the present application is to provide a document generation device based on deep learning to improve the quality of generated documents, aiming at the disadvantages of the prior art, the device includes:
an acquisition unit configured to acquire a user input title;
the word segmentation unit is used for performing word segmentation operation on the title by adopting a jieba Chinese word segmentation method;
a keyword set obtaining unit, configured to perform keyword extraction on the title subjected to the word segmentation operation by using a TF-IDF algorithm to obtain a keyword set;
the file acquisition unit is used for conveying the keyword set as input to a preset text generation algorithm model, and taking the obtained output as a file;
and the storage unit is used for storing the jieba Chinese word segmentation method, the TF-IDF algorithm and the preset text generation algorithm model.
Preferably, the apparatus further comprises:
and the preset text generation algorithm model acquisition unit is used for acquiring the preset text generation algorithm model.
Preferably, the preset text generation algorithm model obtaining unit includes:
the system comprises a recommended document acquisition unit, a recommendation document generation unit and a recommendation document generation unit, wherein the recommended document acquisition unit is used for acquiring a plurality of manually written recommended documents;
the word segmentation unit of the recommended case is used for carrying out word segmentation operation on the recommended case by adopting a jieba Chinese word segmentation method;
a recommended case keyword set acquisition unit, configured to perform keyword extraction on the recommended case subjected to word segmentation operation by using a TF-IDF algorithm to obtain a keyword set;
and the training unit is used for taking the keyword set as input and the recommended file as output to train so as to obtain the text generation algorithm model.
It is a third object of the present application to address the deficiencies of the prior art, to provide an electronic device for improving the quality of generating a document, the electronic device comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the above-described document generation methods.
It is a fourth object of the present application to address the deficiencies of the prior art, and to provide a non-transitory computer readable storage medium for improving the quality of generating a document, the non-transitory computer readable storage medium storing computer instructions for causing the computer to perform any of the above-described document generation methods.
According to the method and the device, the file data are processed, key words which are described in the file are extracted and used as the input of the model, new training data are constructed, the input and the output have strong corresponding relation, the relevance between the file generated by the model and the input is improved, and the quality of the generated file is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a method for generating a deep learning-based pattern according to the present invention;
FIG. 2 is a flowchart of a method for generating a deep learning-based pattern according to the present invention;
FIG. 3 is a flow chart of a method of the jieba Chinese word segmentation method employed in the present invention;
FIG. 4 is a schematic structural diagram of an intelligent document generation apparatus based on deep learning technology according to the present invention;
FIG. 5 is a schematic structural diagram of a preset text generation algorithm model obtaining unit provided by the present invention;
fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
The embodiments of the present disclosure are described below with specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure in the specification. It is to be understood that the described embodiments are merely illustrative of some, and not restrictive, of the embodiments of the disclosure. The disclosure may be embodied or carried out in various other specific embodiments, and various modifications and changes may be made in the details within the description without departing from the spirit of the disclosure. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present disclosure, and the drawings only show the components related to the present disclosure rather than the number, shape and size of the components in actual implementation, and the type, amount and ratio of the components in actual implementation may be changed arbitrarily, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
The embodiment of the disclosure provides a file generation method based on deep learning. The deep learning-based pattern generation method provided by the embodiment can be executed by a computing device, which can be implemented as software or as a combination of software and hardware, and can be integrally arranged in a server, a terminal device and the like.
Referring to fig. 1, in an embodiment of the present application, the present application provides a deep learning-based pattern generation method, including:
s101: and acquiring a user input title.
In this step, the user inputs the title in the document generation system, and the system can input the title by the acquirer to wait for the subsequent steps.
S102: and performing word segmentation operation on the title by adopting a jieba Chinese word segmentation method.
In the step, the system performs word segmentation operation on the title input by the user by adopting a jieba Chinese word segmentation method, so that a plurality of word segments can be obtained.
S103: and extracting keywords of the title subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set.
In the step, the system adopts TF-IDF algorithm to extract keywords from the segmented title, so as to obtain a keyword set.
S104: and conveying the keyword set as input to a preset text generation algorithm model, and taking the obtained output as a file.
In this step, the system combines the keyword set obtained in step S103 as input and transmits the input to the preset text generation algorithm model, and the preset text generation algorithm model can automatically obtain an output, which is the required file.
As shown in fig. 2, in the embodiment of the present application, the method for generating the preset text generation algorithm model in step S104 includes the steps of:
s201: and acquiring a plurality of manually written recommended documents.
In this step, first, a plurality of manually written recommended documents, such as recommended documents of products, are obtained. Theoretically, the preset text generation algorithm model obtained when the number of recommended documents is larger will be more accurate, but at the same time, the data processing capacity required by the system will be improved correspondingly. Therefore, in the embodiment of the present application, 1000 recommended documents can be selected for processing.
S202: and performing word segmentation operation on the recommended case by adopting a jieba Chinese word segmentation method.
In this step, the system performs the word segmentation operation on all the recommended cases by adopting a jieba Chinese word segmentation method so as to perform the subsequent processing.
S203: and extracting keywords of the recommended case subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set.
In the step, the system adopts TF-IDF algorithm to extract the key words of all the recommended documents after word segmentation operation, so as to obtain the key word set.
S204: and taking the keyword set as input, taking the recommended file as output, and training to obtain the text generation algorithm model.
In this step, the system takes the keyword set as input, takes all the recommended documents as output, and then obtains the text generation algorithm model after training.
As shown in fig. 3, in the embodiment of the present application, the jieba chinese word segmentation method is the prior art, and specifically includes the following steps:
loading a log-in word dictionary;
establishing a Trie tree word segmentation model;
acquiring an input statement;
establishing a word segmentation DAG word graph based on the Trie tree word segmentation model;
calculating global probability Route to obtain word frequency maximum segmentation combination based on prefix dictionary;
judging whether the word frequency maximum segmentation combination is a login word or not; if the word is judged to be the login word, the login word is labeled according to the dictionary identification and output; if the Chinese characters are not the login words, separately processing the Chinese characters and the non-Chinese characters by using Token identification;
if the character is judged to be Chinese, loading a hidden horse HMM probability model graph, obtaining word segmentation and label by using a Viterbi algorithm dynamic rule, and then outputting;
if the judgment result is non-Chinese, identifying the combination of English, number and time forms, giving corresponding labels, and outputting.
In this embodiment of the present application, the step between obtaining the input sentence and establishing the participle DAG word graph based on the Trie tree participle model further includes:
cleaning the sentence and judging whether the sentence contains special characters or not;
and if the special characters are judged to be contained, separating the special characters, identifying the special characters as unknown parts of speech and then outputting the unknown parts of speech.
The following describes in detail specific steps of a deep learning-based document generation method provided by the present application with specific embodiments.
(1) A plurality of manually written recommendation documents are obtained in advance, and two samples are as follows:
sample 1: the white color is a new color in the year, the color design is very in place, and the artistic sense is very sufficient. The plastic machine body is made of materials, and has good hand feeling, thinness, beauty and elegance.
Sample 2: the high-definition large screen is excellent, the picture is clear and fine, and comfortable visual experience is brought. The game player can play the game with a non-trivial running memory, and the game player can be used for more applications without being afraid of cards. By using the high-definition lens, the optical anti-shake photographing is simpler, the photographing is easy no matter in a close shot or a long shot, and the image quality is clearer. The battery capacity is large, and the battery is standby for a long time, so that stronger cruising experience is brought to the user.
(2) Performing word segmentation operation on the case by using a jieba Chinese word segmentation method, wherein the two samples are specifically as follows after the word segmentation operation:
after word segmentation of sample 1:the white color is a new color in the year, the color design is very in place, and the artistic sense is very sufficient. The plastic machine body is made of materials, and has good hand feeling, thinness, beauty and elegance.
Sample 2 after word segmentation:the high-definition large screen is excellent, the picture is clear and fine, and comfortable visual experience is brought. Can not be matched What to want is a small amount of running memoryHow to play and more applications are not afraid of cards. By using the high-definition lens, the optical anti-shake photographing device The method is simple, and the picture quality is clearer no matter the short shot or the long shot is easy to shoot. The battery has large capacity and ultra-long standby, and brings more strength to people The cruising experience of.
Wherein, the words on the same underline are the same participle combination.
(3) Extracting a certain proportion of keywords from the segmented case by adopting a TF-IDF algorithm to obtain a keyword set, wherein the two samples are processed as follows:
sample 1 keyword: color design material plastic
Sample 2 keyword: lens battery capacity
(4) Taking the keyword set as input and the pre-collected file as output to construct a training data set, which is specifically as follows:
sample 1 input: color design material plastic
Sample 1 output: the white color is a new color in the year, the color design is very in place, and the artistic sense is very sufficient. The plastic machine body is made of materials, and has good hand feeling, thinness, beauty and elegance.
Sample 2 input: lens battery capacity
Sample 2 output: the high-definition large screen is excellent, the picture is clear and fine, and comfortable visual experience is brought. The game player can play the game with a non-trivial running memory, and the game player can be used for more applications without being afraid of cards. By using the high-definition lens, the optical anti-shake photographing is simpler, the photographing is easy no matter in a close shot or a long shot, and the image quality is clearer. The battery capacity is large, and the battery is standby for a long time, so that stronger cruising experience is brought to the user.
In the embodiment of the application, a Transformer-based text generation model is trained by using the training data set.
(5) After a new user input title is obtained, extracting keywords from the title input by the user by using the keyword set in the step (3):
and (3) user input: XXX color black business flagship full-network mobile phone calf material 8G memory super large battery capacity
Extracting keywords: color material battery capacity
(6) And (4) inputting the keyword set obtained in the step (3) as an input into the text generation model obtained in the step (4), and generating a recommended case by the text generation model, wherein the specific steps are as follows:
the orange color is a new color in the year, and a luxurious smell is revealed by matching with the material of the calfskin. High capacity batteries, and low power processing techniques to achieve ultra-long standby times.
As shown in fig. 4, in the embodiment of the present application, the present invention further provides a deep learning-based document generation apparatus, including:
an obtaining unit 401, configured to obtain a user input title;
a word segmentation unit 402, configured to perform word segmentation on the title by using a jieba chinese word segmentation method;
a keyword set obtaining unit 403, configured to perform keyword extraction on the title subjected to the word segmentation operation by using a TF-IDF algorithm to obtain a keyword set;
a document acquisition unit 404, configured to convey the keyword set as input to a preset text generation algorithm model, and take the obtained output as a document;
the storage unit 405 is used for storing the jieba Chinese word segmentation method, the TF-IDF algorithm and the preset text generation algorithm model;
a preset text generation algorithm model obtaining unit 406, configured to obtain the preset text generation algorithm model.
The apparatus shown in fig. 4 can correspondingly execute the content in the above method embodiment, and details of the part not described in detail in this embodiment refer to the content described in the above method embodiment, which is not described again here.
As shown in fig. 5, in the embodiment of the present application, the preset text generation algorithm model obtaining unit 406 includes:
a recommended document acquiring unit 501, configured to acquire a plurality of manually written recommended documents;
a recommended case word segmentation unit 502, configured to perform word segmentation on the recommended case by using a jieba chinese word segmentation method;
a recommended document keyword set obtaining unit 503, configured to perform keyword extraction on the recommended document subjected to the word segmentation operation by using a TF-IDF algorithm to obtain a keyword set;
a training unit 504, configured to take the keyword set as an input and the recommended pattern as an output, and train to obtain the text generation algorithm model.
The apparatus shown in fig. 5 may correspondingly execute the content in the above method embodiment, and details of the part not described in detail in this embodiment refer to the content described in the above method embodiment, which is not described again here.
Referring to fig. 6, an embodiment of the present disclosure also provides an electronic device 60, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the deep learning based pattern generation method of the method embodiments described above.
The disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the foregoing method embodiments.
The disclosed embodiments also provide a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the deep learning based pattern generation method of the aforementioned method embodiments.
Referring now to FIG. 6, a schematic diagram of an electronic device 60 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, the electronic device 60 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 60 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 60 to communicate with other devices wirelessly or by wire to exchange data. While the figures illustrate an electronic device 60 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
According to the method and the device, the file data are processed, key words which are described in the file are extracted and used as the input of the model, new training data are constructed, the input and the output have strong corresponding relation, the relevance between the file generated by the model and the input is improved, and the quality of the generated file is greatly improved.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.
Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present disclosure should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (10)

1. A method for generating a file based on deep learning is characterized by comprising the following steps:
acquiring a user input title;
performing word segmentation operation on the title by adopting a jieba Chinese word segmentation method;
extracting keywords from the titles subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set;
and conveying the keyword set as input to a preset text generation algorithm model, and taking the obtained output as a file.
2. The method of claim 1, wherein the method of generating the predetermined text-generating algorithm model comprises the steps of:
acquiring a plurality of manually written recommended documents;
performing word segmentation operation on the recommended case by adopting a jieba Chinese word segmentation method;
extracting keywords of the recommended case subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set;
and taking the keyword set as input, taking the recommended file as output, and training to obtain the text generation algorithm model.
3. The method for generating a literary sketch of claim 1 or 2, wherein the jieba chinese participle method comprises the steps of:
acquiring an input statement;
establishing a word segmentation DAG word graph based on the Trie tree word segmentation model;
calculating global probability Route to obtain word frequency maximum segmentation combination based on prefix dictionary;
judging whether the word frequency maximum segmentation combination is a login word or not; if the word is judged to be the login word, the login word is labeled according to the dictionary identification and output; if the Chinese characters are not the login words, separately processing the Chinese characters and the non-Chinese characters by using Token identification;
if the character is judged to be Chinese, loading a hidden horse HMM probability model graph, obtaining word segmentation and label by using a Viterbi algorithm dynamic rule, and then outputting;
if the judgment result is non-Chinese, identifying the combination of English, number and time forms, giving corresponding labels, and outputting.
4. The method of claim 3, wherein before the building of the participle DAG word graph based on the Trie-tree participle model, the method further comprises the steps of:
loading the log-in word dictionary;
and establishing the Trie tree word segmentation model.
5. The method of claim 3, wherein between the obtaining the input sentence and the building a participle DAG word graph based on the Trie-tree participle model, further comprising the steps of:
cleaning the sentence and judging whether the sentence contains special characters or not;
and if the special characters are judged to be contained, separating the special characters, identifying the special characters as unknown parts of speech and then outputting the unknown parts of speech.
6. A deep learning-based document generation apparatus, the apparatus comprising:
an acquisition unit configured to acquire a user input title;
the word segmentation unit is used for performing word segmentation operation on the title by adopting a jieba Chinese word segmentation method;
a keyword set obtaining unit, configured to perform keyword extraction on the title subjected to the word segmentation operation by using a TF-IDF algorithm to obtain a keyword set;
the file acquisition unit is used for conveying the keyword set as input to a preset text generation algorithm model, and taking the obtained output as a file;
and the storage unit is used for storing the jieba Chinese word segmentation method, the TF-IDF algorithm and the preset text generation algorithm model.
7. The document creation apparatus of claim 6, wherein the apparatus further comprises:
and the preset text generation algorithm model acquisition unit is used for acquiring the preset text generation algorithm model.
8. The document generation apparatus according to claim 7, wherein the preset text generation algorithm model acquisition unit includes:
the system comprises a recommended document acquisition unit, a recommendation document generation unit and a recommendation document generation unit, wherein the recommended document acquisition unit is used for acquiring a plurality of manually written recommended documents;
the word segmentation unit of the recommended case is used for carrying out word segmentation operation on the recommended case by adopting a jieba Chinese word segmentation method;
a recommended case keyword set acquisition unit, configured to perform keyword extraction on the recommended case subjected to word segmentation operation by using a TF-IDF algorithm to obtain a keyword set;
and the training unit is used for taking the keyword set as input and the recommended file as output to train so as to obtain the text generation algorithm model.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 5.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the document generation method of any one of claims 1 to 5.
CN202010001994.4A 2020-01-02 2020-01-02 Deep learning based file generation method and device and electronic equipment Pending CN111178056A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010001994.4A CN111178056A (en) 2020-01-02 2020-01-02 Deep learning based file generation method and device and electronic equipment
CA3166742A CA3166742A1 (en) 2020-01-02 2020-08-28 Method of generating text plan based on deep learning, device and electronic equipment
PCT/CN2020/111951 WO2021135319A1 (en) 2020-01-02 2020-08-28 Deep learning based text generation method and apparatus and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010001994.4A CN111178056A (en) 2020-01-02 2020-01-02 Deep learning based file generation method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111178056A true CN111178056A (en) 2020-05-19

Family

ID=70654435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010001994.4A Pending CN111178056A (en) 2020-01-02 2020-01-02 Deep learning based file generation method and device and electronic equipment

Country Status (3)

Country Link
CN (1) CN111178056A (en)
CA (1) CA3166742A1 (en)
WO (1) WO2021135319A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446214A (en) * 2020-12-09 2021-03-05 北京有竹居网络技术有限公司 Method, device and equipment for generating advertisement keywords and storage medium
WO2021135319A1 (en) * 2020-01-02 2021-07-08 苏宁云计算有限公司 Deep learning based text generation method and apparatus and electronic device
CN113553838A (en) * 2021-08-03 2021-10-26 稿定(厦门)科技有限公司 Commodity file generation method and device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048304A (en) * 2021-10-26 2022-02-15 盐城金堤科技有限公司 Effective keyword determination method and device, storage medium and electronic equipment
CN116151232B (en) * 2023-04-24 2023-08-29 北京龙智数科科技服务有限公司 Method and device for generating model by multi-stage training text title

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902545A (en) * 2012-12-25 2014-07-02 北京京东尚科信息技术有限公司 Category path recognition method and system
CN110309114A (en) * 2018-02-28 2019-10-08 腾讯科技(深圳)有限公司 Processing method, device, storage medium and the electronic device of media information

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992764B (en) * 2017-12-29 2022-12-16 阿里巴巴集团控股有限公司 File generation method and device
CN111178056A (en) * 2020-01-02 2020-05-19 苏宁云计算有限公司 Deep learning based file generation method and device and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902545A (en) * 2012-12-25 2014-07-02 北京京东尚科信息技术有限公司 Category path recognition method and system
CN110309114A (en) * 2018-02-28 2019-10-08 腾讯科技(深圳)有限公司 Processing method, device, storage medium and the electronic device of media information

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021135319A1 (en) * 2020-01-02 2021-07-08 苏宁云计算有限公司 Deep learning based text generation method and apparatus and electronic device
CN112446214A (en) * 2020-12-09 2021-03-05 北京有竹居网络技术有限公司 Method, device and equipment for generating advertisement keywords and storage medium
CN112446214B (en) * 2020-12-09 2024-02-02 北京有竹居网络技术有限公司 Advertisement keyword generation method, device, equipment and storage medium
CN113553838A (en) * 2021-08-03 2021-10-26 稿定(厦门)科技有限公司 Commodity file generation method and device

Also Published As

Publication number Publication date
WO2021135319A1 (en) 2021-07-08
CA3166742A1 (en) 2021-07-08

Similar Documents

Publication Publication Date Title
CN110287278B (en) Comment generation method, comment generation device, server and storage medium
CN110969012B (en) Text error correction method and device, storage medium and electronic equipment
CN111178056A (en) Deep learning based file generation method and device and electronic equipment
CN110134931B (en) Medium title generation method, medium title generation device, electronic equipment and readable medium
CN111381909B (en) Page display method and device, terminal equipment and storage medium
US9613268B2 (en) Processing of images during assessment of suitability of books for conversion to audio format
CN109543058B (en) Method, electronic device, and computer-readable medium for detecting image
CN111027331A (en) Method and apparatus for evaluating translation quality
CN112633947B (en) Text generation model generation method, text generation method, device and equipment
CN111144952A (en) Advertisement recommendation method, device, server and storage medium based on user interests
CN110278447B (en) Video pushing method and device based on continuous features and electronic equipment
WO2022111347A1 (en) Information processing method and apparatus, electronic device, and storage medium
CN110737774A (en) Book knowledge graph construction method, book recommendation method, device, equipment and medium
CN111767740A (en) Sound effect adding method and device, storage medium and electronic equipment
CN111401044A (en) Title generation method and device, terminal equipment and storage medium
CN114943006A (en) Singing bill display information generation method and device, electronic equipment and storage medium
CN110826619A (en) File classification method and device of electronic files and electronic equipment
CN113868538A (en) Information processing method, device, equipment and medium
CN114298007A (en) Text similarity determination method, device, equipment and medium
CN111859970B (en) Method, apparatus, device and medium for processing information
CN113971400B (en) Text detection method and device, electronic equipment and storage medium
CN112446214A (en) Method, device and equipment for generating advertisement keywords and storage medium
CN109522141B (en) Information pushing method and device, server, equipment and storage medium
CN111259676A (en) Translation model training method and device, electronic equipment and storage medium
CN113221572A (en) Information processing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200519

RJ01 Rejection of invention patent application after publication