CN115543925A

CN115543925A - File processing method and device, electronic equipment and computer readable medium

Info

Publication number: CN115543925A
Application number: CN202211533258.9A
Authority: CN
Inventors: 秦志宾; 闫松伟; 王瑞; 饶新宏
Original assignee: Beijing Defeng New Journey Technology Co ltd
Current assignee: Beijing Defeng Xinzheng Technology Co ltd
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2022-12-30
Anticipated expiration: 2042-12-02
Also published as: CN115543925B

Abstract

The embodiment of the disclosure discloses a file processing method, a file processing device, an electronic device and a computer readable medium. One embodiment of the method comprises: in response to detecting the file processing information input in the file processing interface, performing semantic extraction on the file processing information to obtain semantic information; extracting a plurality of keywords from the semantic information; determining a keyword part-of-speech type corresponding to each keyword in a plurality of keywords; acquiring a keyword type required by file processing to obtain a required keyword type set; in response to determining that the keyword part type sets corresponding to the multiple keywords comprise the required keyword part type sets, determining the keywords corresponding to each required keyword part type in the required keyword part type sets to obtain a keyword set; determining a first keyword coding set corresponding to the keyword set; and carrying out file processing on the file to be processed in the target database. The method and the device can quickly and efficiently process the files to be processed.

Description

File processing method and device, electronic equipment and computer readable medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a file processing method, a file processing device, electronic equipment and a computer readable medium.

Background

At present, databases are widely applied to daily life of people. For the processing of files in a database, the following methods are generally adopted: the file processing is usually performed manually by the related technicians on the files to be processed in the database.

However, the inventor finds that when the files in the database are processed in the above manner, the following technical problems often exist:

first, the operation is too complex, and the files in the database are too numerous, resulting in too complex file processing efficiency, long file searching time, and low efficiency due to more search resources occupied by file searching.

Secondly, the generated text thought information aiming at the file content is not accurate enough, so that the subsequent file to be processed is not accurate enough.

The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a file processing method, apparatus, electronic device and computer readable medium to solve one or more of the technical problems set forth in the background section above.

In a first aspect, some embodiments of the present disclosure provide a file processing method, including: in response to the detection of the file processing information input by the target user on the file processing interface, performing semantic extraction on the file processing information to obtain semantic information; extracting a plurality of keywords from the semantic information; determining a keyword part-of-speech type corresponding to each keyword in the plurality of keywords; acquiring a keyword type required by file processing to obtain a required keyword type set; determining a keyword corresponding to each required keyword type in the required keyword type set in response to determining that the keyword type sets corresponding to the plurality of keywords comprise the required keyword type set, and obtaining a keyword set; determining a first keyword coding set corresponding to the keyword set; according to the first keyword coding set, performing file processing on a file to be processed in a target database by using a file hierarchical tree model, wherein the file hierarchical tree model is established based on a file directory in the target database, and tree nodes of the file hierarchical tree model comprise: the file information and at least one second keyword code corresponding to the file information, wherein the second keyword codes and the keywords have one-to-one correspondence.

In a second aspect, some embodiments of the present disclosure provide a document processing apparatus including: the semantic extraction unit is configured to respond to the detection of file processing information input by a target user in a file processing interface, perform semantic extraction on the file processing information and obtain semantic information; an extraction unit configured to extract a plurality of keywords from the semantic information; a first determining unit configured to determine a keyword part type corresponding to each of the plurality of keywords; the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a keyword type required by file processing to obtain a required keyword type set; a second determining unit configured to determine a keyword corresponding to each of the required keyword type sets in response to determining that the keyword type sets corresponding to the plurality of keywords include the required keyword type set, resulting in a keyword set; a third determining unit configured to determine a first keyword encoding set corresponding to the keyword set; a file processing unit configured to perform file processing on a file to be processed in the target database by using a file hierarchical tree model according to the first keyword encoding set, wherein the file hierarchical tree model is established based on a file directory in the target database, and tree nodes of the file hierarchical tree model include: the file information and at least one second keyword code corresponding to the file information exist, wherein the second keyword codes and the keywords have one-to-one correspondence.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, where the program when executed by a processor implements a method as described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following advantages: the file processing method of some embodiments of the present disclosure can quickly and efficiently process files to be processed. Specifically, the reason why the processing of the related to-be-processed file is not fast and efficient is that: the operation is too complex, and the files in the database are too numerous, so that the file processing efficiency is too complex, the file searching time is long, the file searching occupies more searching resources, and the efficiency is low. Based on this, in the file processing method of some embodiments of the present disclosure, first, in response to detecting file processing information input by a target user on a file processing interface, semantic extraction is performed on the file processing information to obtain semantic information. Here, inputting the file processing information through the file processing interface can provide great convenience for file processing in the database. The target user does not need to know the operation method of the database, only needs to input the file processing information, and can automatically perform file processing on the file to be processed according to the file processing information in the target database. In addition, semantic extraction is carried out on the file processing information, so that a plurality of keywords related to the file processing information are conveniently acquired subsequently. So as to quickly inquire out the file to be processed through a plurality of key words in the subsequent process. Then, a plurality of keywords are extracted from the semantic information so as to facilitate the subsequent query of the file to be processed and the determination of the processing mode of the file to be processed. Then, the keyword type corresponding to each keyword in the plurality of keywords is determined, so as to determine whether the file processing information input by the target user lacks the key file processing content or not. And then, acquiring a keyword type required by file processing to obtain a required keyword type set for subsequently determining whether the file processing information input by the target user lacks key file processing content. Further, in response to determining that the keyword part type sets corresponding to the plurality of keywords include the required keyword part type set, determining keywords corresponding to each required keyword part type in the required keyword part type set to obtain a keyword set for querying a subsequent file to be processed and determining a processing mode of the file to be processed. And then, determining a first keyword coding set corresponding to the keyword set for querying a subsequent file to be processed and determining a processing mode of the file to be processed. And finally, according to the first keyword coding set, the file hierarchy tree model is utilized to efficiently and accurately process the file to be processed in the target database. The file hierarchical tree model is established based on file directories in the target database. The tree nodes of the file hierarchical tree model comprise: the file information and at least one second keyword code corresponding to the file information exist, wherein the second keyword codes and the keywords have one-to-one correspondence. In summary, the file processing information is input into the file processing interface, and a series of processing is performed on the file processing information, so that the file hierarchical tree model is used to quickly and efficiently process the file to be processed in the target database.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow diagram of some embodiments of a document processing method according to the present disclosure;

FIG. 2 is a schematic block diagram of some embodiments of document processing apparatus according to the present disclosure;

FIG. 3 is a schematic block diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence of the functions performed by the devices, modules or units.

It is noted that references to "a" or "an" in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will appreciate that references to "one or more" are intended to be exemplary and not limiting unless the context clearly indicates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Referring to fig. 1, a flow 100 of some embodiments of a document processing method according to the present disclosure is shown. The file processing method comprises the following steps:

step 101, in response to detecting the file processing information input by the target user on the file processing interface, performing semantic extraction on the file processing information to obtain semantic information.

In some embodiments, in response to detecting the document processing information input by the target user in the document processing interface, the execution subject of the document processing method may perform semantic extraction on the document processing information to obtain semantic information. Wherein the target user may be a user who enters file processing information on the text processing interface. That is, the target user may be an operating user of the file processing interface. The document processing interface may be an information input interface for document processing. The file processing information may be processing information of the file. For example, the file handling information may be one of: the file management method comprises the following steps of inquiring information of the file, acquiring information of the file, adding information of the file and adjusting information of the file. The semantic information may characterize the file content of the file handling information.

As an example, the execution agent may input the document processing information to a pre-trained semantic extraction network model to generate semantic information. Wherein, the semantic extraction network model can be a model for extracting semantic content of information. For example, the semantic extraction network model may be a Long short-term memory (LSTM) network model.

For example, the file handling information is "extract file a from target database". The semantic information may be "extract file a".

Step 102, extracting a plurality of keywords from the semantic information.

In some embodiments, the execution body may extract a plurality of keywords from the semantic information. The plurality of keywords may be key nouns, key verbs or key adjectives. And are not limited thereto.

As an example, the execution subject may input semantic information to a keyword extraction network model trained in advance to obtain a plurality of keywords. The keyword extraction network model may be a network model for extracting keywords. For example, the keyword extraction network model may be an LSTM network model.

For example, the semantic information may be "extract file a". The plurality of keywords includes: "extract", "File A".

Step 103, determining a keyword type corresponding to each keyword in the plurality of keywords.

In some embodiments, the execution agent may determine a keyword part-of-speech type corresponding to each of the plurality of keywords. Wherein the keyword type may include, but is not limited to, at least one of: verb part-of-speech type, noun part-of-speech type, adjective part-of-speech type.

As an example, the execution subject may determine, through a part-of-speech type table, a part-of-speech type corresponding to each of the plurality of keywords. The part of speech type table represents the incidence relation between the part of speech type and the word.

For example, the plurality of keywords includes: "extract", "File A". Then the "extract" corresponding keyword part-of-speech type may be a verb part-of-speech type. "File A" may be a "noun part-of-speech type".

And 104, acquiring a keyword type required by file processing to obtain a required keyword type set.

In some embodiments, the execution subject may obtain the keyword types required for file processing in a wired manner or a wireless manner, so as to obtain the set of the required keyword types. The set of required keyword property types may be the keyword property types necessary for processing the file in the database. For example, the set of required keyword types includes: verb part-of-speech type, noun part-of-speech type.

Step 105, in response to determining that the set of part of speech types corresponding to the plurality of keywords includes the set of required part of speech types, determining keywords corresponding to each required part of speech type in the set of required part of speech types, and obtaining a set of keywords.

In some embodiments, in response to determining that the set of keyword part types corresponding to the plurality of keywords includes the set of desired keyword part types, the execution subject may determine a keyword corresponding to each of the set of desired keyword part types, resulting in a set of keywords.

As an example, the execution subject may determine, by means of a keyword query, a keyword corresponding to each required keyword type in the required keyword type set to obtain the keyword set.

In some optional implementation manners of some embodiments, after determining that the set of keyword types corresponding to the multiple keywords includes the set of required keyword types, determining a keyword corresponding to each required keyword type in the set of required keyword types, and obtaining a set of keywords, the method further includes:

in a first step, in response to determining that the set of part of speech types corresponding to the plurality of keywords does not include the set of required part of speech types, a set of differential part of speech types is determined. The difference keyword type set is a subset of the required keyword type set, and the difference keyword type set and the keyword type set do not have the same keyword type.

And secondly, generating an information query text corresponding to each differential keyword type in the differential keyword type set.

And thirdly, popping up an information filling popup window on the file processing interface so that the target user can fill a corresponding keyword set aiming at the information inquiry text.

And fourthly, determining a keyword set corresponding to the required keyword type set according to the filled keyword set and the plurality of keywords.

And 106, determining a first keyword coding set corresponding to the keyword set.

In some embodiments, the execution principal may determine a first keyword code set corresponding to the keyword set.

As an example, the execution subject may determine the first keyword encoding set corresponding to the keyword set through a keyword encoding table. The keyword code table can represent the association relationship between the keywords and the keyword codes. The first keyword codes in the first keyword code set have a one-to-one correspondence with the keywords in the keyword set. The first keyword encoding may characterize the identity information (i.e., identification information) of the corresponding keyword.

It should be noted that the storage space consumed for storing the first keyword code is smaller than the storage space consumed for storing the corresponding keyword.

Optionally, the keyword coding table is generated by:

firstly, a common keyword set is obtained.

And secondly, inputting each common keyword in the common keyword set into the coding model to generate a keyword code to obtain a keyword code set. The coding model may be a model for coding a keyword. The coding model may be a first coding model in the coding and decoding network model. For example, the first coding model may be a multi-layer Recurrent Neural Network (RNN) model.

And thirdly, generating a keyword code table according to the keyword code set and the common keyword set.

As an example, the execution principal may generate the keyword encoding table in a manner that matches a set of keyword encodings with a set of common keywords.

And step 107, according to the first keyword coding set, performing file processing on the file to be processed in the target database by using the file hierarchical tree model.

In some embodiments, the execution subject may perform file processing on the file to be processed in the target database by using a file hierarchy tree model according to the first keyword encoding set through various manners. The file hierarchical tree model is established based on file directories in the target database, and tree nodes of the file hierarchical tree model comprise the file hierarchical tree model. The file information and at least one second keyword code corresponding to the file information exist, wherein the second keyword codes and the keywords have one-to-one correspondence. The file information may include, but is not limited to, at least one of: file name, file path.

In some optional implementations of some embodiments, after step 107, the method further includes:

the method comprises the following steps of firstly, responding to the fact that the processing of the file to be processed is determined to be finished, and determining file information corresponding to the file to be processed and at least one keyword corresponding to the file to be processed. The file information corresponding to the file to be processed comprises: file path and file name.

At least one keyword corresponding to the file to be processed is generated through the following steps:

step 1, determining a file content text and a text name corresponding to a file to be processed.

And 2, performing word segmentation processing on the file content text and the text name to obtain a first word set.

And 3, extracting semantic information of the file content text.

And 4, extracting a plurality of words from the semantic information to obtain a second word set.

And 5, screening words with the word frequency larger than a preset number from the first word set to obtain a screened word set.

And 6, fusing the screened word set and the second word set to obtain a fused word set serving as at least one keyword.

And secondly, updating the model of the file hierarchical tree model according to the file information corresponding to the file to be processed and at least one keyword.

As an example, first, the execution subject may determine a tree node to be processed in the file hierarchy tree according to a file path included in file information corresponding to a file to be processed. And then, acquiring keywords of verb part-of-speech types corresponding to the file to be processed. And then, taking the keywords of the verb part-of-speech type as a processing mode of the file to be processed to adjust the tree nodes to be processed in the file hierarchy tree to obtain an adjusted file hierarchy data model. Wherein the adjustment process may include: deletion processing, addition processing and modification processing.

In some optional implementations of some embodiments, the at least one second keyword code corresponding to the file information is generated by:

the first step, in response to the fact that the file corresponding to the file information is not an empty file, determining a subfile set corresponding to the file.

The subfile set is each subfile included in the file.

And secondly, determining the file format type of each subfile in the subfile set.

Wherein the file format type may include, but is not limited to, at least one of: text file type, video file format type, audio file format type. Text file types may include, but are not limited to, at least one of: TXT file format, DOC file format. The video file format types may include, but are not limited to, at least one of: mp4 format, mov format, avi format. The audio file format types may include, but are not limited to, at least one of: mp3 format, wma format.

And thirdly, in response to the fact that the file format type of each subfile in the subfile set is determined to be the first file format type, determining the file name of each subfile to be an initial text to obtain an initial text set. Wherein the first file format type may be a text file type.

And fourthly, performing text segmentation on each initial text in the initial text set to generate at least one first word.

And fifthly, determining the at least one first word as at least one first keyword.

And sixthly, coding the at least one first keyword to obtain at least one second keyword code.

Optionally, the executing entity may input at least one first keyword to a second encoding model in the encoding and decoding network model to generate a second keyword encoding, resulting in a second keyword encoding set.

Optionally, after the encoding of the at least one first keyword to obtain the at least one second keyword, the method further includes the following steps:

in a first step, in response to determining that the file format type of each subfile in the subfile set is a first file format type and a second file format type, performing word segmentation on the file name of at least one subfile of the second file format type to obtain a word set. Wherein the second file format type may be a video file format type or an audio file format type.

As an example, the execution subject may perform word segmentation on the file name of the at least one subfile of the second file format type by using a word segmentation method to obtain a word set.

Secondly, for each subfile in the at least one subfile, executing a text keyword extraction step:

the first substep, obtain the file content that the above-mentioned subfile corresponds to.

As an example, for a sub-file being a video file format type, the execution body may obtain file contents corresponding to the sub-file through an audio-to-text conversion model. The audio-to-text conversion model may be a model that converts audio to text. For example, the audio and text conversion model may be an Automatic Speech Recognition (ASR) model.

As another example, for a subfile of an audio file format type, the execution body may obtain file contents corresponding to the subfile through an audio-to-text conversion model.

And a second substep of inputting the file content into the text thought information extraction model to output text thought information. The text idea information extraction model can be a model of a core idea for extracting file contents. For example, the text idea information extraction model may be a Transformer model.

And a third substep, extracting words with word frequency meeting preset conditions in the file content, and using the words as keywords to obtain a content keyword set. The preset condition may be a word with a word frequency greater than a target value in the file content. The target value may be set in advance. For example, the target value may be 10.

And a fourth substep, performing text word segmentation on the text thought information to generate thought keywords, and obtaining a thought keyword set.

And a fifth substep, performing word fusion on the content keyword set and the thought keyword set to obtain a fused word set.

And thirdly, removing the at least one subfile from the subfile set to obtain a removed subfile set.

And fourthly, performing text segmentation on the removed subfiles in the removed subfile set to generate at least one second word.

And fifthly, determining the at least one second word as at least one second keyword.

And sixthly, summarizing the at least one second keyword and the fused word set to obtain a summarized word set.

And seventhly, coding the words in the summary word set to obtain at least one second keyword code.

Optionally, the executing agent may input the file content into the text idea information extraction model to output the text idea information, and includes the following steps:

firstly, inputting the file content into a text field type determination model to output a text field type corresponding to the file content as a target text field type. The text domain type determination model can be used for determining the content of the domain type related to the file content. For example, the text domain type can be, but is not limited to, at least one of: the computer field, the chemical field, the physical field and the literature field. The text domain type determination model can be a model composed of an LSTM model and a multilayer convolutional neural network.

And secondly, performing word segmentation processing on the file content to obtain a text word set.

And thirdly, performing word screening on the text words in the text word set to remove the language words and obtain a screened text word set.

And fourthly, carrying out duplication elimination treatment on the screened text word set to obtain a duplication elimination text word set.

And fifthly, carrying out word coding processing on each duplicate removal text word in the duplicate removal text word set to obtain each text word vector.

And sixthly, performing word coding processing on the target text field type to obtain a text field type vector.

And seventhly, inputting the text word vectors and the text field type vectors into a first incidence relation determination model to obtain first scores aiming at the text word vectors. And the text word vectors in the text word vectors have a one-to-one correspondence with the first scores in the first scores. The first association relation determination model may be a model that determines an association relation between each text word vector and the text field type vector. For example, the first association determination model may be a Transformer model.

And step eight, inputting the file content into a text question and answer task determination model to output text question task information corresponding to the file content as target text question task information. The text question-and-answer task determination model may be a model for determining information of the text question-and-answer task. The text question-answering task information may include at least one of: the content of the representation file is task information of the question and answer text, and the content of the representation file is not task information of the question and answer task. Wherein the text question-answering task determination model can be a multi-layer LSTM model.

And ninthly, performing word coding processing on the target text problem task information to obtain a text problem task information vector.

And step ten, inputting the text question task information vector and each text word vector into a second incidence relation determination model to obtain each second score aiming at each text word vector. And the text word vectors in the text word vectors have a one-to-one correspondence with the second scores in the second scores. The second association determining model may be a model that determines an association between each text word vector and the text question task information vector. For example, the second association determining model may be a Transformer model.

Eleventh, the file content is input to an emotion analysis model, and emotion analysis information corresponding to the file content is output and serves as target emotion analysis information. The emotion analysis model may be a model for generating emotion analysis information. For example, the sentiment analysis model may be a multi-layer LSTM model. The sentiment analysis information may be, but is not limited to, at least one of: positive emotion information, negative emotion information, neutral emotion information.

And a twelfth step of coding the target emotion analysis information to obtain an emotion analysis vector.

And step thirteen, inputting the emotion analysis vector and each text word vector into a third association relation determination model to obtain each third score for each text word vector. And the text word vectors in the text word vectors are in one-to-one correspondence with the third scores in the third scores. The third association relation determination model may be a model that determines an association relation between each text word vector and the emotion analysis vector. For example, the third correlation determination model may be a Transformer model.

Fourteenth, inputting the file content to an intention identification model to output intention identification information corresponding to the file content as target intention identification information. Wherein the intention recognition model may be a model that generates intention recognition information. For example, the intention identifying information may be, but is not limited to, at least one of: weather inquiry information, song discussion information, and viewpoint publication information. The intent recognition model described above may be a multi-layered LSTM model.

And fifteenth step, coding the target intention identification information to obtain an intention identification coding vector.

Sixthly, inputting the intention recognition coding vector and each text word vector into a fourth incidence relation determination model to obtain each fourth score aiming at each text word vector. And the text word vectors in the text word vectors have a one-to-one correspondence with the fourth scores in the fourth scores. The fourth association determining model may be a model that determines an association between each text word vector and the intention recognition encoding vector. For example, the fourth association determination model may be a Transformer model.

Seventeenth, for each text word in each text word, averaging the first score, the second score, the third score and the fourth score corresponding to the text word to obtain an average score.

And eighteen, sequencing the obtained average score sets from large to small to obtain an average score sequence.

And nineteenth step, determining the text words corresponding to the average scores of the preset number in the average score sequence as the keywords of the text thought information.

And twentieth, inputting the text words corresponding to the average scores of the previous preset number into the text generation model to generate text thought information. Wherein the text generation model may be a model that generates text. For example, the text generation model may be a multi-layer LSTM model.

The technical scheme and the related content thereof are used as an invention point of the embodiment of the disclosure, and the problem that the processing of the subsequent file to be processed is not accurate due to the fact that the text thought information aiming at the file content is not accurate enough in the technical problem II mentioned in the background technology is solved. ". Therefore, through the text field type determination model, the intention identification model, the emotion analysis model and the text question and answer task information model, core keywords for subsequently generating text thought information can be screened from file contents in multiple aspects, and therefore the gist accuracy of the text thought information is guaranteed. Further, text thought information for a plurality of keywords can be accurately generated by the text generation model.

In some optional implementation manners of some embodiments, the performing, by using the file hierarchy tree model according to the first keyword encoding set, file processing on the file to be processed in the target database may include the following steps:

firstly, inputting each first keyword code in the first keyword code set into a decoding model in a coding and decoding network model to generate a first decoding word and obtain a first decoding word set. Wherein, the coding and decoding network model comprises: a first coding model, a second coding model and a decoding model. The first coding model and the second coding model have one-to-one correspondence with the decoding model. That is, there is a one-to-one correspondence between the output encoded information of the first encoding model, the output encoded information of the second encoding model, and the output decoded information of the decoding model. The decoding model may be a multi-layer LSTM model.

And secondly, inputting each first decoding word in the first decoding word set into the second coding model to generate a second keyword code to obtain a second keyword code set.

And thirdly, according to the second keyword coding set, performing file processing on the file to be processed in the target database in various modes by using the file hierarchical tree model.

Optionally, the performing, according to the second keyword encoding set and by using the file hierarchy tree model, file processing on the file to be processed in the target database may include the following steps:

the method comprises the first step of obtaining a plurality of second keyword codes aiming at target keyword types. Wherein the target keyword part-of-speech type may be a verb part-of-speech type.

And secondly, determining repeated keyword codes between the second keyword code set and the plurality of second keyword codes to obtain a repeated keyword code set.

And thirdly, removing the duplication of the repeated keyword coding set to obtain a duplication-removed keyword coding set.

And fourthly, removing the repeated keyword code set from the second keyword code set to obtain a removed keyword code set.

And fifthly, inputting each de-emphasis keyword in the de-emphasis keyword coding set into a decoding model in the coding and decoding network model to generate a second decoding word and obtain a second decoding word set.

And sixthly, determining a file set corresponding to the removed keyword coding set by using the file hierarchical tree model as a file to be processed.

And seventhly, performing file processing on the file to be processed in the target database according to the second decoding word set.

As an example, the execution subject may perform file processing on the file to be processed in the target database by using each second decoded word in the second decoded word set as a verb to be processed of the file to be processed.

The above embodiments of the present disclosure have the following advantages: the file processing method of some embodiments of the present disclosure can quickly and efficiently process files to be processed. Specifically, the reason why the processing of the related to-be-processed file is not fast and efficient is that: the operation is too complex, and the files in the database are too numerous, so that the file processing efficiency is too complex, the file searching time is long, the searching resources occupied by the file searching are more, and the efficiency is lower. Based on this, the file processing method of some embodiments of the present disclosure first performs semantic extraction on the file processing information input by the target user in the file processing interface in response to detecting the file processing information, so as to obtain semantic information. Here, inputting the file processing information through the file processing interface can provide great convenience for file processing in the database. The target user does not need to know the operation method of the database, only needs to input the file processing information, and can automatically process the file to be processed from the target database according to the file processing information. In addition, semantic extraction is carried out on the file processing information, so that a plurality of keywords related to the file processing information are conveniently acquired subsequently. So as to quickly inquire out the file to be processed through a plurality of key words in the subsequent process. Then, a plurality of keywords are extracted from the semantic information so as to facilitate the subsequent query of the file to be processed and the determination of the processing mode of the file to be processed. Then, the keyword part-of-speech type corresponding to each keyword in the plurality of keywords is determined, so as to determine whether the file processing information input by the target user lacks key file processing content or not. And then, acquiring a keyword type required by file processing to obtain a required keyword type set for subsequently determining whether the file processing information input by the target user lacks key file processing content. Further, in response to determining that the keyword part type sets corresponding to the plurality of keywords include the required keyword part type set, determining keywords corresponding to each required keyword part type in the required keyword part type set to obtain a keyword set for querying a subsequent file to be processed and determining a processing mode of the file to be processed. And then, determining a first keyword coding set corresponding to the keyword set for querying a subsequent file to be processed and determining a processing mode of the file to be processed. And finally, according to the first keyword coding set, the file hierarchy tree model is utilized to efficiently and accurately process the file to be processed in the target database. The file hierarchical tree model is established based on file directories in the target database. The tree nodes of the file hierarchical tree model comprise: the file information and at least one second keyword code corresponding to the file information exist, wherein the second keyword codes and the keywords have one-to-one correspondence. In summary, through the input of the file processing information in the file processing interface and a series of processing of the file processing information, the file hierarchy tree model is utilized to quickly and efficiently process the file to be processed in the target database.

With further reference to fig. 2, as an implementation of the methods illustrated in the above figures, the present disclosure provides some embodiments of a document processing apparatus, which correspond to those illustrated in fig. 1, and which may be particularly applicable in various electronic devices.

As shown in fig. 2, a document processing apparatus 200 includes: a semantic extracting unit 201, an extracting unit 202, a first determining unit 203, an acquiring unit 204, a second determining unit 205, a third determining unit 206, and a file processing unit 207. The semantic extraction unit 201 is configured to, in response to detecting file processing information input by a target user on a file processing interface, perform semantic extraction on the file processing information to obtain semantic information; an extracting unit 202 configured to extract a plurality of keywords from the semantic information; a first determining unit 203 configured to determine a keyword part type corresponding to each of the plurality of keywords; an obtaining unit 204 configured to obtain a keyword type required by file processing, to obtain a set of the required keyword type; a second determining unit 205 configured to determine a keyword corresponding to each required keyword type in the required keyword type set in response to determining that the keyword type sets corresponding to the plurality of keywords include the required keyword type set, resulting in a keyword set; a third determining unit 206 configured to determine a first keyword encoding set corresponding to the keyword set; a file processing unit 207, configured to perform file processing on a file to be processed in the target database according to the first keyword encoding set by using a file hierarchy tree model, wherein the file hierarchy tree model is established based on a file directory in the target database, and tree nodes of the file hierarchy tree model include: the file information and at least one second keyword code corresponding to the file information exist, wherein the second keyword codes and the keywords have one-to-one correspondence.

It is to be understood that the units described in the document processing apparatus 200 correspond to the respective steps in the method described with reference to fig. 1. Thus, the operations, features and advantages of the method described above are also applicable to the file processing apparatus 200 and the units included therein, and are not described herein again.

Referring now to fig. 3, a block diagram of an electronic device (e.g., electronic device) 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, or the like; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to communicate wirelessly or by wire with other devices to exchange data. While fig. 3 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be alternatively implemented or provided. Each block shown in fig. 3 may represent one device or may represent multiple devices, as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 309, or installed from the storage device 308, or installed from the ROM 302. The computer program, when executed by the processing apparatus 301, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: in response to detecting file processing information input by a target user on a file processing interface, performing semantic extraction on the file processing information to obtain semantic information; extracting a plurality of keywords from the semantic information; determining a keyword part-of-speech type corresponding to each keyword in the plurality of keywords; acquiring a keyword type required by file processing to obtain a required keyword type set; determining a keyword corresponding to each required keyword type in the required keyword type set in response to determining that the keyword type sets corresponding to the plurality of keywords comprise the required keyword type set, and obtaining a keyword set; determining a first keyword coding set corresponding to the keyword set; according to the first keyword encoding set, performing file processing on a file to be processed in a target database by using a file hierarchical tree model, wherein the file hierarchical tree model is established based on a file directory in the target database, and tree nodes of the file hierarchical tree model comprise: the file information and at least one second keyword code corresponding to the file information exist, wherein the second keyword codes and the keywords have one-to-one correspondence.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a semantic extraction unit, an extraction unit, a first determination unit, an acquisition unit, a second determination unit, a third determination unit, and a file processing unit. The names of these units do not form a limitation on the units themselves in some cases, and for example, the acquiring unit may also be described as "a unit for acquiring a keyword type required for file processing, resulting in a set of required keyword types".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combinations of the above-mentioned features, and other embodiments in which the above-mentioned features or their equivalents are combined arbitrarily without departing from the spirit of the invention are also encompassed. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method of file processing, comprising:

in response to the detection of the file processing information input by the target user on the file processing interface, performing semantic extraction on the file processing information to obtain semantic information;

extracting a plurality of keywords from the semantic information;

determining a keyword part-of-speech type corresponding to each keyword in the plurality of keywords;

acquiring a keyword type required by file processing to obtain a required keyword type set;

in response to determining that the keyword part type sets corresponding to the plurality of keywords comprise the required keyword part type set, determining keywords corresponding to each required keyword part type in the required keyword part type set to obtain a keyword set;

determining a first keyword coding set corresponding to the keyword set;

according to the first keyword coding set, file processing is carried out on the file to be processed in the target database by utilizing a file hierarchical tree model, wherein the file hierarchical tree model is established based on a file directory in the target database, and tree nodes of the file hierarchical tree model comprise: the file information and at least one second keyword code corresponding to the file information exist, wherein the second keyword codes and the keywords have one-to-one correspondence.

2. The method of claim 1, wherein the method further comprises:

in response to the fact that the to-be-processed file is determined to be processed, determining file information corresponding to the to-be-processed file and at least one keyword corresponding to the to-be-processed file;

and updating the model of the file hierarchical tree model according to the file information corresponding to the file to be processed and at least one keyword.

3. The method of claim 1, wherein after determining keywords corresponding to each desired keyword type in the desired set of keyword types, resulting in a set of keywords, in response to determining that the set of keyword types corresponding to the plurality of keywords comprises the desired set of keyword types, the method further comprises:

in response to determining that the set of part-of-speech types corresponding to the plurality of keywords does not include the set of required part-of-speech types, determining a set of differential part-of-speech types, wherein the set of differential part-of-speech types is a subset of the set of required part-of-speech types, and the set of differential part-of-speech types and the set of part-of-speech types do not have the same part-of-speech type;

generating an information query text corresponding to each differential keyword type in the differential keyword type set;

popping up an information filling popup window on the file processing interface so that the target user can fill a corresponding keyword set aiming at the information inquiry text;

and determining a keyword set corresponding to the required keyword type set according to the filled keyword set and the plurality of keywords.

4. The method of claim 1, wherein the at least one second keyword code corresponding to the document information is generated by:

in response to determining that the file corresponding to the file information is not an empty file, determining a subfile set corresponding to the file;

determining a file format type of each subfile in the subfile set;

determining the file name of each subfile as an initial text to obtain an initial text set in response to the determination that the file format type of each subfile in the subfile set is a first file format type;

performing text segmentation on each initial text in the initial text set to generate at least one first word;

determining the at least one first word as at least one first keyword;

and coding the at least one first keyword to obtain at least one second keyword code.

5. The method of claim 4, wherein after said encoding said at least one first keyword resulting in at least one second keyword encoding, said method further comprises:

in response to determining that the file format type of each subfile in the subfile set is a first file format type and a second file format type, performing word segmentation on the file name of at least one subfile of the second file format type to obtain a word set;

for each subfile of the at least one subfile, performing a text keyword extraction step:

acquiring file contents corresponding to the subfiles;

inputting the file content into a text thought information extraction model to output text thought information;

extracting words of which the word frequency meets preset conditions in the file content, and taking the words as keywords to obtain a content keyword set;

performing text word segmentation on the text thought information to generate thought keywords to obtain a thought keyword set;

performing word fusion on the content keyword set and the thought keyword set to obtain a fused word set;

removing the at least one subfile from the subfile set to obtain a removed subfile set;

performing text word segmentation on the removed subfiles in the removed subfile set to generate at least one second word;

determining the at least one second word as at least one second keyword;

summarizing the at least one second keyword and the fused word set to obtain a summarized word set;

and coding the words in the summary word set to obtain at least one second keyword code.

6. The method of claim 1, wherein said performing file processing on the file to be processed in the target database using the file hierarchy tree model according to the first keyword code set comprises:

inputting each first keyword code in the first keyword code set into a decoding model in a coding and decoding network model to generate a first decoding word, so as to obtain a first decoding word set, wherein the coding and decoding network model comprises: a first coding model, a second coding model and a decoding model;

inputting each first decoding word in the first decoding word set into the second coding model to generate a second keyword code to obtain a second keyword code set;

and according to the second keyword coding set, performing file processing on the file to be processed in the target database by using the file hierarchical tree model.

7. The method according to claim 6, wherein said performing file processing on the file to be processed in the target database according to the second keyword encoding set by using the file hierarchy tree model comprises:

acquiring a plurality of second keyword codes aiming at the target keyword type;

determining repeated keyword codes between the second keyword code set and the plurality of second keyword codes to obtain a repeated keyword code set;

removing duplication of the repeated keyword coding set to obtain a duplication-removing keyword coding set;

removing the repeated keyword code set from the second keyword code set to obtain a removed keyword code set;

inputting each de-emphasis key word in the de-emphasis key word coding set into a decoding model in the coding and decoding network model to generate a second decoding word to obtain a second decoding word set;

determining a file set corresponding to the keyword removal coding set by using the file hierarchical tree model as a file to be processed;

and performing file processing on the file to be processed in the target database according to the second decoding word set.

8. A document processing apparatus comprising:

the semantic extraction unit is configured to respond to the detection of file processing information input by a target user in a file processing interface, perform semantic extraction on the file processing information and obtain semantic information;

an extraction unit configured to extract a plurality of keywords from the semantic information;

a first determining unit configured to determine a keyword part-of-speech type corresponding to each of the plurality of keywords;

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a keyword type required by file processing to obtain a required keyword type set;

a second determining unit configured to determine a keyword corresponding to each required keyword type in the required keyword type set, in response to determining that the keyword type sets corresponding to the plurality of keywords include the required keyword type set, resulting in a keyword set;

a third determining unit configured to determine a first keyword encoding set corresponding to the keyword set;

a file processing unit configured to perform file processing on a file to be processed in the target database by using a file hierarchy tree model according to the first keyword encoding set, wherein the file hierarchy tree model is established based on a file directory in the target database, and tree nodes of the file hierarchy tree model include: the file information and at least one second keyword code corresponding to the file information exist, wherein the second keyword codes and the keywords have one-to-one correspondence.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.