CN115543925B

CN115543925B - File processing method, device, electronic equipment and computer readable medium

Info

Publication number: CN115543925B
Application number: CN202211533258.9A
Authority: CN
Inventors: 秦志宾; 闫松伟; 王瑞; 饶新宏
Original assignee: Beijing Defeng New Journey Technology Co ltd
Current assignee: Beijing Defeng New Journey Technology Co ltd
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2023-09-19
Anticipated expiration: 2042-12-02
Also published as: CN115543925A

Abstract

Embodiments of the present disclosure disclose a file processing method, apparatus, electronic device, and computer readable medium. One embodiment of the method comprises the following steps: responding to the detection of the file processing information input in the file processing interface, and carrying out semantic extraction on the file processing information to obtain semantic information; extracting a plurality of keywords from the semantic information; determining a keyword type corresponding to each keyword in the plurality of keywords; acquiring a keyword type required by file processing to obtain a required keyword type set; in response to determining that the keyword part type sets corresponding to the plurality of keywords include the required keyword part type set, determining keywords corresponding to each required keyword part type in the required keyword part type set, and obtaining a keyword set; determining a first keyword encoding set corresponding to the keyword set; and carrying out file processing on the files to be processed in the target database. The embodiment can rapidly and efficiently process the file to be processed.

Description

File processing method, device, electronic equipment and computer readable medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a file processing method, a device, an electronic apparatus, and a computer readable medium.

Background

At present, databases are widely applied to daily life of people. For processing files in a database, the following methods are generally adopted: often, the relevant technician manually performs file processing on files to be processed in the database.

However, the inventors have found that when the above manner is used to process files in a database, there are often the following technical problems:

first, the operation is too complex, and the files in the database are too many, which results in too complex file processing efficiency, longer file searching time, and more searching resources occupied by the file searching, which results in lower efficiency.

Secondly, the generated text idea information aiming at the file content is not accurate enough, so that the subsequent file to be processed is not processed accurately enough.

The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, may contain information that does not form the prior art that is already known to those of ordinary skill in the art in this country.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose document processing methods, apparatuses, electronic devices, and computer readable media to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a file processing method, including: responding to the detection of file processing information input by a target user on a file processing interface, and carrying out semantic extraction on the file processing information to obtain semantic information; extracting a plurality of keywords from the semantic information; determining a keyword type corresponding to each keyword in the plurality of keywords; acquiring a keyword type required by file processing to obtain a required keyword type set; in response to determining that the keyword part type sets corresponding to the plurality of keywords include the required keyword part type set, determining keywords corresponding to each required keyword part type in the required keyword part type set, and obtaining a keyword set; determining a first keyword encoding set corresponding to the keyword set; according to the first keyword encoding set, performing file processing on a file to be processed in a target database by using a file hierarchy tree model, wherein the file hierarchy tree model is built based on a file directory in the target database, and tree nodes of the file hierarchy tree model comprise: the file information and at least one second keyword code corresponding to the file information, wherein the second keyword code and the keyword have a one-to-one correspondence.

In a second aspect, some embodiments of the present disclosure provide a document processing apparatus, including: the semantic extraction unit is configured to perform semantic extraction on the file processing information in response to detection of the file processing information input by a target user on the file processing interface to obtain semantic information; an extraction unit configured to extract a plurality of keywords from the semantic information; a first determination unit configured to determine a keyword type corresponding to each of the plurality of keywords; the acquisition unit is configured to acquire the keyword type required by file processing to obtain a required keyword type set; a second determining unit configured to determine a keyword corresponding to each required keyword type in the required keyword type sets to obtain a keyword set in response to determining that the keyword type sets corresponding to the plurality of keywords include the required keyword type set; a third determining unit configured to determine a first keyword encoding set corresponding to the keyword set; and a file processing unit configured to perform file processing on a file to be processed in a target database according to the first keyword encoding set using a file hierarchy tree model, wherein the file hierarchy tree model is built based on a file directory in the target database, and tree nodes of the file hierarchy tree model include: the file information and at least one second keyword code corresponding to the file information, wherein the second keyword code and the keyword have a one-to-one correspondence.

In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following advantageous effects: the file processing method of some embodiments of the present disclosure can rapidly and efficiently process the file to be processed. In particular, the reason for the related pending file processing not being fast and efficient is that: the operation is too complicated, files in the database are too many, so that the file processing efficiency is too complicated, the file searching time is longer, the searching resources occupied by the file searching are more, and the efficiency is lower. Based on this, in the file processing method according to some embodiments of the present disclosure, first, in response to detecting file processing information input by a target user at a file processing interface, semantic extraction is performed on the file processing information, so as to obtain semantic information. Here, inputting the file processing information through the file processing interface can provide great convenience for file processing in the database. The target user does not need to know the operation method of the database, only needs to input file processing information, and can automatically process the file to be processed according to the file processing information in the target database. In addition, the document processing information is subjected to semantic extraction so as to conveniently acquire a plurality of keywords related to a chiffon of the document processing information. In order to facilitate a quick interrogation of the files to be processed by means of a plurality of keywords thereafter. And then extracting a plurality of keywords from the semantic information so as to facilitate the inquiry of the subsequent files to be processed and determine the processing mode of the files to be processed. Then, the keyword type corresponding to each keyword in the plurality of keywords is determined so as to facilitate the subsequent determination of whether the file processing information input by the target user lacks the key file processing content. And then, acquiring the keyword type required by file processing to obtain a required keyword type set for later determining whether the file processing information input by the target user lacks the key file processing content. Further, in response to determining that the keyword part type sets corresponding to the plurality of keywords include the required keyword part type set, determining keywords corresponding to each required keyword part type in the required keyword part type set, and obtaining a keyword set for querying a subsequent file to be processed and determining a processing mode of the file to be processed. And further, determining a first keyword encoding set corresponding to the keyword set for inquiring the subsequent files to be processed and determining the processing mode of the files to be processed. Finally, according to the first keyword encoding set, the file to be processed can be efficiently and accurately processed in the target database by using the file hierarchical tree model. The file hierarchical tree model is established based on file catalogues in the target database. The tree nodes of the file hierarchy tree model include: the file information and at least one second keyword code corresponding to the file information, wherein the second keyword code and the keyword have a one-to-one correspondence. In summary, by inputting the file processing information in the file processing interface and a series of processing of the file processing information, the file processing can be performed on the file to be processed in the target database quickly and efficiently by using the file hierarchical tree model.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow chart of some embodiments of a file processing method according to the present disclosure;

FIG. 2 is a schematic diagram of the structure of some embodiments of a document processing device according to the present disclosure;

fig. 3 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Referring to FIG. 1, a flow 100 is shown according to some embodiments of the document processing method of the present disclosure. The file processing method comprises the following steps:

Step 101, in response to detecting file processing information input by a target user on a file processing interface, performing semantic extraction on the file processing information to obtain semantic information.

In some embodiments, in response to detecting the file processing information input by the target user at the file processing interface, the executing body of the file processing method may perform semantic extraction on the file processing information to obtain semantic information. The target user may be a user who inputs file processing information on a text processing interface. That is, the target user may be an operating user of the file processing interface. The file processing interface may be an information input interface for file processing. The file processing information may be processing information of a file. For example, the file processing information may be one of: query information of the file, acquisition information of the file, addition information of the file and adjustment information of the file. The semantic information may characterize the file content of the file processing information.

As an example, the execution subject described above may input the file processing information to a pre-trained semantic extraction network model to generate semantic information. The semantic extraction network model may be a model that extracts semantic content of information. For example, the semantic extraction network model may be a Long short-term memory (LSTM) network model.

For example, the file processing information is "extract file a from the target database". The semantic information may be "extract file a".

Step 102, extracting a plurality of keywords from the semantic information.

In some embodiments, the execution body may extract a plurality of keywords from the semantic information. The keywords may be key nouns, key verbs, or key adjectives. And are not limited herein.

As an example, the execution subject may input semantic information to a keyword extraction network model trained in advance to obtain a plurality of keywords. The keyword extraction network model may be a network model for extracting keywords. For example, the keyword extraction network model may be an LSTM network model.

For example, the semantic information may be "extract file a". The plurality of keywords includes: "extract", "File A".

Step 103, determining a keyword type corresponding to each keyword in the plurality of keywords.

In some embodiments, the execution body may determine a keyword type corresponding to each of the plurality of keywords. Wherein the keyword type may include, but is not limited to, at least one of: verb part-of-speech type, noun part-of-speech type, adjective part-of-speech type.

As an example, the execution subject may determine a keyword type corresponding to each of the plurality of keywords through a part-of-speech type table. Wherein the part-of-speech type table characterizes the association between part-of-speech types and words.

For example, the plurality of keywords includes: "extract", "File A". Then, the "extract" corresponding keyword part-of-speech type may be a verb part-of-speech type. "File A" may be "noun part-of-speech type".

Step 104, obtaining the key part type required by file processing, and obtaining a required key part type set.

In some embodiments, the executing entity may obtain the keyword type required by the file processing in a wired manner or a wireless manner, so as to obtain the required keyword type set. Wherein the set of required keyword type may be the keyword type necessary for processing the document in the database. For example, the set of required keyword types includes: verb part-of-speech type, noun part-of-speech type.

Step 105, in response to determining that the keyword type sets corresponding to the plurality of keywords include the required keyword type set, determining keywords corresponding to each required keyword type in the required keyword type set, and obtaining a keyword set.

In some embodiments, in response to determining that the set of keyword part types corresponding to the plurality of keywords includes the set of required keyword part types, the execution body may determine keywords corresponding to each required keyword part type in the set of required keyword part types, resulting in a set of keywords.

As an example, the executing entity may determine, by means of a keyword query, keywords corresponding to each required keyword type in the required keyword type set, and obtain a keyword set.

In some optional implementations of some embodiments, after determining the keyword corresponding to each required keyword type in the required keyword type set in response to determining that the keyword type set corresponding to the plurality of keywords includes the required keyword type set, the method further includes the steps of:

in a first step, a set of differential keyword type is determined in response to determining that the set of keyword type corresponding to the plurality of keywords does not include the set of required keyword types. Wherein the set of differential keyword type is a subset of a set of required keyword type, the set of differential keyword type and the set of keyword type do not have the same keyword type.

And generating information inquiry text corresponding to each difference keyword type in the difference keyword type set.

And thirdly, popup information filling popup window on the file processing interface so as to fill in corresponding keyword sets for the target user query text aiming at the information.

And step four, determining a keyword set corresponding to the required keyword type set according to the filled keyword set and the keywords.

And 106, determining a first keyword encoding set corresponding to the keyword set.

In some embodiments, the executing entity may determine a first keyword encoding set corresponding to the keyword set.

As an example, the execution body may determine the first keyword encoding set corresponding to the keyword set through a keyword encoding table. Wherein the keyword encoding table may characterize an association between keywords and keyword encodings. The first keyword codes in the first keyword code set have a one-to-one correspondence with keywords in the keyword set. The first keyword encoding may characterize identity information (i.e., identification information) of the corresponding keyword.

It should be noted that, the memory space consumed for storing the first keyword code is smaller than the memory space consumed for storing the corresponding keyword.

Optionally, the keyword encoding table is generated by:

first, a common keyword set is obtained.

And secondly, inputting each common keyword in the common keyword set into a coding model to generate a keyword code, and obtaining a keyword code set. The coding model may be a model for coding keywords. The coding model may be a first coding model of the coding and decoding network model. For example, the first coding model may be a multi-layer recurrent neural network (Recurrent Neural Network, RNN) model.

And thirdly, generating a keyword coding table according to the keyword coding set and the common keyword set.

As an example, the above-described execution body may generate the keyword encoding table in such a manner that the keyword encoding set is matched with the common keyword set.

And step 107, according to the first keyword encoding set, performing file processing on the file to be processed in the target database by using a file hierarchical tree model.

In some embodiments, the executing entity may perform file processing on the file to be processed in the target database according to the first keyword encoding set by using a file hierarchical tree model in various manners. The file hierarchical tree model is built based on file catalogues in the target database, and tree nodes of the file hierarchical tree model comprise. The file information and at least one second keyword code corresponding to the file information, wherein the second keyword code and the keyword have a one-to-one correspondence. The file information may include, but is not limited to, at least one of: file name, file path.

In some optional implementations of some embodiments, after step 107, the method further includes:

and a first step of determining file information corresponding to the file to be processed and at least one keyword corresponding to the file to be processed in response to determining that the processing of the file to be processed is finished. The file information corresponding to the file to be processed comprises: file path and file name.

At least one keyword corresponding to the file to be processed is generated through the following steps:

and step 1, determining the text and the text name of the file content corresponding to the file to be processed.

And step 2, word segmentation processing is carried out on the text and the text name of the file content, and a first word set is obtained.

And step 3, extracting semantic information of the file content text.

And 4, extracting a plurality of words from the semantic information to obtain a second word set.

And 5, screening words with word frequency larger than a preset number from the first word set to obtain a screened word set.

And 6, fusing the screening word set and the second word set to obtain a fused word set serving as at least one keyword.

And secondly, updating the file hierarchical tree model according to the file information corresponding to the file to be processed and the corresponding at least one keyword.

As an example, first, the execution body may determine a tree node to be processed in the file hierarchy tree according to a file path included in the file information corresponding to the file to be processed. Then, a keyword of a verb part-of-speech type corresponding to the file to be processed is acquired. And then, using the keywords of the verb part-of-speech type as a processing mode of the file to be processed to adjust the tree nodes to be processed in the file hierarchy tree, so as to obtain an adjusted file hierarchy data model. Wherein the adjustment process may include: deletion processing, addition processing, modification processing.

In some optional implementations of some embodiments, the at least one second keyword encoding corresponding to the file information is generated by:

and in the first step, determining a sub-file set corresponding to the file in response to determining that the file corresponding to the file information is not an empty file.

The sub-file set is each sub-file included in the file.

And a second step of determining the file format type of each sub-file in the sub-file set.

Wherein the file format type may be at least one of the following including but not limited to: text file type, video file format type, audio file format type. The text file types may include, but are not limited to, at least one of: TXT file format, DOC file format. The video file format types may include, but are not limited to, at least one of: mp4 format, mov format, avi format. The audio file format types may include, but are not limited to, at least one of: mp3 format, wma format.

And thirdly, determining the file name of each sub-file as an initial text to obtain an initial text set in response to determining that the file format type of each sub-file in the sub-file set is the first file format type. Wherein the first file format type may be a text file type.

And fourthly, performing text word segmentation on each initial text in the initial text set to generate at least one first word.

And fifthly, determining the at least one first word as at least one first keyword.

And sixthly, encoding the at least one first keyword to obtain at least one second keyword code.

Optionally, the executing body may input at least one first keyword into a second coding model in the coding and decoding network model to generate a second keyword code, so as to obtain a second keyword code set.

Optionally, after the encoding of the at least one first keyword to obtain at least one second keyword code, the method further includes the following steps:

in the first step, in response to determining that the file format type of each sub-file in the sub-file set is a first file format type and a second file format type, the file name of at least one sub-file in the second file format type is segmented to obtain a word set. Wherein the second file format type may be a video file format type or an audio file format type.

As an example, the execution body may segment the file name of at least one sub-file of the second file format type by using a bargain segmentation method to obtain a word set.

Second, for each of the at least one subfile, performing a text keyword extraction step:

and a first sub-step of obtaining file contents corresponding to the sub-files.

For example, for the sub-file being of a video file format type, the execution body may obtain the file content corresponding to the sub-file through an audio and text conversion model. The audio and text conversion model may be a model that converts audio to text, among other things. For example, the audio and text conversion model may be a speech recognition (Automatic Speech Recognition, ASR) model.

As yet another example, for the sub-file being of the audio file format type, the execution body may obtain the file content corresponding to the sub-file through an audio and text conversion model.

And a second sub-step of inputting the file contents to the text idea information extraction model to output the text idea information. The text idea information extraction model may be a model for extracting a core idea of file contents. For example, the text idea information extraction model may be a transducer model.

And a third sub-step, extracting words with word frequencies meeting preset conditions from the file contents as keywords, and obtaining a content keyword set. The preset condition may be that word frequency in the file content is greater than the target value. The target value may be preset. For example, the target value may be 10.

And a fourth sub-step, performing text word segmentation on the text idea information to generate idea keywords, and obtaining an idea keyword set.

And a fifth sub-step, carrying out word fusion on the content keyword set and the thought keyword set to obtain a fusion word set.

And thirdly, removing the at least one subfile from the subfile set to obtain a removed subfile set.

And fourthly, performing text word segmentation on the removed subfiles in the removed subfiles set to generate at least one second word.

And fifthly, determining the at least one second word as at least one second keyword.

And step six, summarizing the at least one second keyword and the fusion word set to obtain a summarized word set.

And seventh, encoding the words in the summarized word set to obtain at least one second keyword code.

Alternatively, the execution subject may input the file content to the text idea information extraction model to output the text idea information, including the steps of:

the first step is to input the file content into a text domain type determining model to output the text domain type corresponding to the file content as a target text domain type. The text domain type determination model may be content that determines a domain type to which the file content relates. For example, the text field type may be, but is not limited to, at least one of: computer field, chemistry field, physics field, literature field. The text field type determining model may be a model composed of an LSTM model and a multi-layer convolutional neural network.

And secondly, word segmentation processing is carried out on the file content to obtain a text word set.

And thirdly, word screening is carried out on the text words in the text word set so as to remove the word of the Chinese language, and a screened text word set is obtained.

Fourth, the filtering text word sets are subjected to duplication elimination processing, and duplication elimination text word sets are obtained.

And fifthly, carrying out word coding processing on each duplicate-removed text word in the duplicate-removed text word set to obtain each text word vector.

And sixthly, carrying out word coding processing on the target text field type to obtain a text field type vector.

And seventhly, inputting the text word vectors and the text field type vectors into a first association relation determination model to obtain first scores for the text word vectors. The text word vectors in the text word vectors have a one-to-one correspondence with the first scores in the first scores. The first association determination model may be a model that determines association between each text word vector and a text field type vector. For example, the first association determination model may be a transducer model.

And eighth step, inputting the file content into a text question and answer task determination model to output text question task information corresponding to the file content as target text question task information. The text question-answering task determination model may be a model for determining text question-answering task information. The text question-answering task information may include at least one of: the content of the characterization file is task information of a question-answering text, and the content of the characterization file is not task information of a question-answering task. The text question-answering task determination model may be a multi-layer LSTM model.

And ninth, carrying out word coding processing on the target text problem task information to obtain a text problem task information vector.

And tenth, inputting the text question task information vector and each text word vector into a second association relation determination model to obtain each second score for each text word vector. And the text word vectors in the text word vectors have a one-to-one correspondence with the second scores in the second scores. The second association determination model may be a model that determines association between each text word vector and the text question task information vector. For example, the second association determination model may be a transducer model.

And eleventh step, inputting the file content into the emotion analysis model to output emotion analysis information corresponding to the file content as target emotion analysis information. The emotion analysis model may be a model for generating emotion analysis information. For example, the emotion analysis model may be a multilayer LSTM model. The emotion analysis information may be, but is not limited to, at least one of: positive emotion information, negative emotion information, neutral emotion information.

And twelfth, encoding the target emotion analysis information to obtain an emotion analysis vector.

And thirteenth, inputting the emotion analysis vector and each text word vector into a third association relation determination model to obtain each third score for each text word vector. And the text word vectors in the text word vectors have a one-to-one correspondence with the third scores in the third scores. The third association determination model may be a model that determines association between each text word vector and emotion analysis vector. For example, the third association determination model may be a transducer model.

And fourteenth step, inputting the file content into an intention recognition model to output intention recognition information corresponding to the file content as target intention recognition information. The intention recognition model may be a model that generates intention recognition information. For example, the intent recognition information may be, but is not limited to, at least one of: weather inquiry information, song discussion information and view publishing information. The intent recognition model may be a multi-layer LSTM model.

Fifteenth, encoding the target intention identification information to obtain an intention identification encoding vector.

Sixteenth, inputting the intent recognition coding vector and each text word vector into a fourth association relation determination model to obtain each fourth score for each text word vector. And the text word vectors in the text word vectors have a one-to-one correspondence with the fourth scores in the fourth scores. The fourth association determination model may be a model that determines an association between each text word vector and the intention recognition encoding vector. For example, the fourth association determination model may be a transducer model.

Seventeenth, for each text word in each text word, carrying out an average processing on the first score, the second score, the third score and the fourth score corresponding to the text word to obtain an average score.

Eighteenth, sorting the obtained average score sets from big to small to obtain an average score sequence.

And nineteenth, determining the text words corresponding to the first preset number of average scores in the average score sequence as keywords of the text idea information.

And twenty-step, inputting text words corresponding to the first preset number of average scores into a text generation model to generate text idea information. The text generation model may be a model that generates text. For example, the text generation model may be a multi-layer LSTM model.

The technical scheme and the related content are taken as an invention point of the embodiment of the disclosure, so that the technical problem II mentioned in the background art is solved, and the generated text idea information aiming at the file content is not accurate enough, so that the subsequent file to be processed is not accurately processed. ". Therefore, through the text field type determining model, the intention identifying model, the emotion analyzing model and the text question-answering task information model, core keywords for generating text idea information subsequently can be screened out from file contents in multiple aspects, and the accuracy of the gist of the text idea information is guaranteed. Further, by the text generation model, text idea information for a plurality of keywords can be accurately generated.

In some optional implementations of some embodiments, the processing the file to be processed in the target database according to the first keyword encoding set and using the file hierarchical tree model may include the following steps:

and the first step is to input each first keyword code in the first keyword code set into a decoding model in the coding and decoding network model to generate a first decoding word, so as to obtain a first decoding word set. Wherein, the encoding and decoding network model comprises: a first encoding model, a second encoding model, and a decoding model. The first coding model and the second coding model have a one-to-one correspondence with the decoding model. That is, there is a one-to-one correspondence between the output encoding information of the first encoding model, the output encoding information of the second encoding model, and the output decoding information of the decoding model. The decoding model may be a multi-layer LSTM model.

And a second step of inputting each first decoding word in the first decoding word set into the second coding model to generate a second keyword code, thereby obtaining a second keyword code set.

And thirdly, according to the second keyword coding set, utilizing the file hierarchical tree model to process the file to be processed in the target database in various modes.

Optionally, the processing the file to be processed in the target database according to the second keyword encoding set by using the file hierarchical tree model may include the following steps:

the first step is to obtain a plurality of second keyword encodings for the target keyword part type. Wherein the target keyword part-of-speech type may be a verb part-of-speech type.

And a second step of determining repeated keyword codes between the second keyword code set and the plurality of second keyword codes to obtain a repeated keyword code set.

And thirdly, performing de-duplication on the repeated keyword coding set to obtain a de-duplication keyword coding set.

And step four, removing the repeated keyword coding set from the second keyword coding set to obtain a removed keyword coding set.

And fifthly, inputting each duplication eliminating keyword in the duplication eliminating keyword coding set into a decoding model in the coding and decoding network model to generate a second decoding word, and obtaining a second decoding word set.

And sixthly, determining a file set corresponding to the removed keyword coding set by using the file hierarchical tree model as a file to be processed.

And seventhly, carrying out file processing on the file to be processed in the target database according to the second decoding word set.

As an example, the execution body may use each second decoded word in the second decoded word set as a processing verb for processing the file to be processed, and perform file processing on the file to be processed in the target database.

With further reference to fig. 2, as an implementation of the method shown in the above figures, the present disclosure provides some embodiments of a document processing apparatus, which correspond to those method embodiments shown in fig. 1, and which are particularly applicable in various electronic devices.

As shown in fig. 2, a document processing apparatus 200 includes: a semantic extraction unit 201, an extraction unit 202, a first determination unit 203, an acquisition unit 204, a second determination unit 205, a third determination unit 206, and a file processing unit 207. The semantic extraction unit 201 is configured to perform semantic extraction on the file processing information in response to detection of the file processing information input by the target user on the file processing interface, so as to obtain semantic information; an extraction unit 202 configured to extract a plurality of keywords from the semantic information; a first determining unit 203 configured to determine a keyword type corresponding to each of the plurality of keywords; an obtaining unit 204 configured to obtain a required keyword type for file processing, to obtain a required keyword type set; a second determining unit 205 configured to determine, in response to determining that the keyword part type sets corresponding to the plurality of keywords include the required keyword part type set, keywords corresponding to each required keyword part type in the required keyword part type set, and obtain a keyword set; a third determining unit 206 configured to determine a first keyword encoding set corresponding to the keyword set; a file processing unit 207 configured to perform file processing on a file to be processed in a target database according to the first keyword encoding set using a file hierarchy tree model, wherein the file hierarchy tree model is built based on a file directory in the target database, and tree nodes of the file hierarchy tree model include: the file information and at least one second keyword code corresponding to the file information, wherein the second keyword code and the keyword have a one-to-one correspondence.

It will be appreciated that the elements described in the document processing device 200 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations, features and advantages described above with respect to the method are equally applicable to the document processing device 200 and the units contained therein, and are not described herein.

Referring now to fig. 3, a schematic diagram of an electronic device (e.g., electronic device) 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 3 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 3 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 309, or from storage device 308, or from ROM 302. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.

It should be noted that, in some embodiments of the present disclosure, the computer readable medium may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: responding to the detection of file processing information input by a target user on a file processing interface, and carrying out semantic extraction on the file processing information to obtain semantic information; extracting a plurality of keywords from the semantic information; determining a keyword type corresponding to each keyword in the plurality of keywords; acquiring a keyword type required by file processing to obtain a required keyword type set; in response to determining that the keyword part type sets corresponding to the plurality of keywords include the required keyword part type set, determining keywords corresponding to each required keyword part type in the required keyword part type set, and obtaining a keyword set; determining a first keyword encoding set corresponding to the keyword set; according to the first keyword encoding set, performing file processing on a file to be processed in a target database by using a file hierarchy tree model, wherein the file hierarchy tree model is built based on a file directory in the target database, and tree nodes of the file hierarchy tree model comprise: the file information and at least one second keyword code corresponding to the file information, wherein the second keyword code and the keyword have a one-to-one correspondence.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes a semantic extraction unit, an extraction unit, a first determination unit, an acquisition unit, a second determination unit, a third determination unit, and a file processing unit. The names of these units do not limit the unit itself in some cases, for example, the acquisition unit may also be described as "acquire a part-of-keyword type required for file processing, and obtain a unit of a set of required part-of-keyword types".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A document processing method, comprising:

responding to the detection of file processing information input by a target user on a file processing interface, and carrying out semantic extraction on the file processing information to obtain semantic information;

extracting a plurality of keywords from the semantic information;

determining a keyword type corresponding to each keyword in the plurality of keywords;

acquiring a keyword type required by file processing to obtain a required keyword type set;

in response to determining that the keyword part type sets corresponding to the plurality of keywords comprise the required keyword part type set, determining keywords corresponding to each required keyword part type in the required keyword part type set, and obtaining a keyword set;

determining a first keyword encoding set corresponding to the keyword set;

inputting each first keyword code in the first keyword code set to a decoding model in a coding and decoding network model to generate a first decoding word, and obtaining a first decoding word set, wherein the coding and decoding network model comprises: a first encoding model, a second encoding model, and a decoding model;

inputting each first decoding word in the first decoding word set into the second coding model to generate a second keyword code, so as to obtain a second keyword code set;

Acquiring a plurality of second keyword codes aiming at the target keyword part type;

determining repeated keyword codes between the second keyword code set and the plurality of second keyword codes to obtain a repeated keyword code set;

performing de-duplication on the repeated keyword coding set to obtain a de-duplication keyword coding set;

removing the repeated keyword coding set from the second keyword coding set to obtain a removed keyword coding set;

inputting each duplication-removal keyword in the duplication-removal keyword coding set into a decoding model in the coding and decoding network model to generate a second decoding word, so as to obtain a second decoding word set;

determining a file set corresponding to the keyword-removed code set as a file to be processed by using a file hierarchy tree model, wherein the file hierarchy tree model is established based on a file directory in a target database, and tree nodes of the file hierarchy tree model comprise: at least one second keyword code corresponding to the file information, wherein the second keyword code and the keyword have a one-to-one correspondence;

and according to the second decoding word set, carrying out file processing on the file to be processed in the target database.

2. The method of claim 1, wherein the method further comprises:

determining file information corresponding to the file to be processed and at least one keyword corresponding to the file to be processed in response to determining that the file to be processed is processed;

and carrying out model updating on the file hierarchical tree model according to the file information corresponding to the file to be processed and the corresponding at least one keyword.

3. The method of claim 1, wherein, after the determining keywords corresponding to each of the desired keyword part types in the desired keyword part type set in response to determining that the set of keyword part types corresponding to the plurality of keywords includes the desired keyword part type set, the method further comprises:

in response to determining that the set of keyword part types corresponding to the plurality of keywords does not include the set of required keyword part types, determining a set of differential keyword part types, wherein the set of differential keyword part types is a subset of the set of required keyword part types, the set of differential keyword part types not having the same keyword part type as the set of keyword part types;

generating an information inquiry text corresponding to each differential keyword type in the differential keyword type set;

Popup information filling popup window in the file processing interface to allow the target user to fill corresponding keyword sets for the information inquiry text;

and determining a keyword set corresponding to the required keyword type set according to the filled keyword set and the keywords.

4. The method of claim 1, wherein the at least one second keyword code corresponding to the file information is generated by:

in response to determining that the file corresponding to the file information is not an empty file, determining a sub-file set corresponding to the file;

determining the file format type of each sub-file in the sub-file set;

in response to determining that the file format type of each sub-file in the sub-file set is the first file format type, determining the file name of each sub-file as an initial text, and obtaining an initial text set;

performing text word segmentation on each initial text in the initial text set to generate at least one first word;

determining the at least one first word as at least one first keyword;

and encoding the at least one first keyword to obtain at least one second keyword code.

5. The method of claim 4, wherein after said encoding the at least one first keyword to obtain at least one second keyword code, the method further comprises:

In response to determining that the file format type of each sub-file in the sub-file set is a first file format type and a second file format type, word segmentation is carried out on the file name of at least one sub-file of the second file format type, and a word set is obtained;

for each of the at least one subfile, performing a text keyword extraction step:

acquiring file content corresponding to the subfiles;

inputting the file content into a text idea information extraction model to output text idea information;

extracting words with word frequency meeting preset conditions from the file content as keywords to obtain a content keyword set;

performing text word segmentation on the text idea information to generate an idea keyword to obtain an idea keyword set;

carrying out word fusion on the content keyword set and the thought keyword set to obtain a fusion word set;

removing the at least one subfile from the subfile set to obtain a removed subfile set;

text word segmentation is carried out on the removed subfiles in the removed subfiles set to generate at least one second word;

determining the at least one second word as at least one second keyword;

Summarizing the at least one second keyword and the fusion word set to obtain a summarized word set;

and encoding the words in the summarized word set to obtain at least one second keyword code.

6. A document processing apparatus comprising:

the semantic extraction unit is configured to respond to the detection of the file processing information input by a target user on the file processing interface, and perform semantic extraction on the file processing information to obtain semantic information;

an extraction unit configured to extract a plurality of keywords from the semantic information;

a first determination unit configured to determine a keyword part-of-speech type corresponding to each of the plurality of keywords;

the acquisition unit is configured to acquire the keyword type required by file processing to obtain a required keyword type set;

a second determining unit configured to determine, in response to determining that the keyword part type sets corresponding to the plurality of keywords include the required keyword part type set, keywords corresponding to each required keyword part type in the required keyword part type set, resulting in a keyword set;

a third determining unit configured to determine a first keyword encoding set corresponding to the keyword set;

A file processing unit configured to input each first keyword code in the first keyword code set to a decoding model in an encoding and decoding network model to generate a first decoding word, and obtain a first decoding word set, wherein the encoding and decoding network model includes: a first encoding model, a second encoding model, and a decoding model; inputting each first decoding word in the first decoding word set into the second coding model to generate a second keyword code, so as to obtain a second keyword code set; acquiring a plurality of second keyword codes aiming at the target keyword part type; determining repeated keyword codes between the second keyword code set and the plurality of second keyword codes to obtain a repeated keyword code set; performing de-duplication on the repeated keyword coding set to obtain a de-duplication keyword coding set; removing the repeated keyword coding set from the second keyword coding set to obtain a removed keyword coding set; inputting each duplication-removal keyword in the duplication-removal keyword coding set into a decoding model in the coding and decoding network model to generate a second decoding word, so as to obtain a second decoding word set; determining a file set corresponding to the keyword-removed code set as a file to be processed by using a file hierarchy tree model, wherein the file hierarchy tree model is established based on a file directory in a target database, and tree nodes of the file hierarchy tree model comprise: at least one second keyword code corresponding to the file information, wherein the second keyword code and the keyword have a one-to-one correspondence; and according to the second decoding word set, carrying out file processing on the file to be processed in the target database.

7. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.

8. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-5.