CN116644228A - Multi-mode full text information retrieval method, system and storage medium - Google Patents
Multi-mode full text information retrieval method, system and storage medium Download PDFInfo
- Publication number
- CN116644228A CN116644228A CN202310474800.6A CN202310474800A CN116644228A CN 116644228 A CN116644228 A CN 116644228A CN 202310474800 A CN202310474800 A CN 202310474800A CN 116644228 A CN116644228 A CN 116644228A
- Authority
- CN
- China
- Prior art keywords
- file
- information retrieval
- text
- text information
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000004458 analytical method Methods 0.000 claims abstract description 8
- 238000005516 engineering process Methods 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 claims description 3
- 238000003708 edge detection Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000007726 management method Methods 0.000 description 8
- 238000010845 search algorithm Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multi-mode full text information retrieval method, a system and a storage medium, wherein the method comprises the following steps: obtaining files of different types to be managed, wherein the types of the files comprise one or more of texts, pictures, audio or video; judging the type of the file; identifying the file by adopting a corresponding identification strategy according to the type of the file; and outputting the identification result in a text form for subsequent data processing and analysis. The invention realizes quick and accurate retrieval of various types of files, realizes more efficient searching and management of various types of files, improves the efficiency of automatic management of files, and reduces the operation cost of enterprises.
Description
Technical Field
The present invention relates to the field of full text retrieval for multi-modal content, and in particular, to a method, a system, and a storage medium for retrieving multi-modal full text information.
Background
Full text retrieval is a technique for finding a particular word or phrase from a collection of documents. It is a key technology in the digital information age for rapid retrieval of large amounts of text content. If not all keywords are known, full text retrieval techniques can help quickly find the desired information.
Currently mainly used in some of the following common scenarios:
the search function of the e-commerce platform helps users to quickly find required commodities;
the article searching function of the news media website enables a user to search all relevant news according to keywords;
and the searching function of the social media platform enables the user to search all relevant contents such as users, posts, comments and the like according to the keywords.
Full text retrieval systems are currently on the market that are spread around text.
Full text retrieval technology is mature, and various text data can be effectively processed by the full text retrieval system on the market at present. However, with the development of the information age, the amount of data generated by people is continuously increasing, and the types of data of materials are more diversified. For non-text content when processing such data, for example: retrieval of video, audio, images, etc. is still difficult at the present stage and still requires manual processing and searching.
Disclosure of Invention
The invention mainly aims to provide a multi-mode full-text information retrieval method, a system and a storage medium, which aim to quickly and accurately retrieve various types of files, realize more efficient searching and management of the various types of files, improve the efficiency of automatic management of the files and reduce the operation cost of enterprises.
In order to achieve the above object, the present invention provides a multi-modal full text information retrieval method, the method comprising the steps of:
step S10, obtaining files of different types to be managed, wherein the types of the files comprise one or more of texts, pictures, audio or videos;
step S20, judging the type of the file;
step S30, identifying the file by adopting a corresponding identification strategy according to the type of the file;
and step S40, outputting the identification result in a text form for subsequent data processing and analysis.
According to a further technical scheme of the present invention, the step S30 of identifying the file by adopting a corresponding identification policy according to the type of the file includes:
in step S301, if the file type is an audio or video file, an ASR technique is used to perform speech content recognition on the file.
In a further technical scheme of the present invention, if the file type is an audio or video file, the step of performing speech content recognition on the file by using an ASR technique includes:
step S3011, preprocessing voice data: unifying the audio-video file into a sample rate 16k, mono audio data file using an open source tool ffmpeg;
step S3012, feature extraction: converting the preprocessed voice signal into a feature vector by adopting MFCC feature extraction;
step S3013, recognition: and identifying the feature vector sequence by using an acoustic model deep neural network model and a neural network language model, and finding out the most suitable text sequence, namely an identification result.
According to a further technical scheme of the present invention, the step S30 of identifying the file by adopting a corresponding identification policy according to the type of the file includes:
step S302, if the file type is a document file which is not text or picture, converting the document file into a picture file;
step S303, the character content of the picture and the position of the character in the picture are extracted by adopting an OCR technology.
The further technical scheme of the present invention is that, the step S303, the step of extracting the text content of the picture and the position of the text in the picture by using the OCR technology, includes:
step S3031, image preprocessing: the original image is converted into a format suitable for feature extraction and character recognition through carrying out light correction, noise removal, convolution smoothing and image binarization processing on the image;
step S3032, character segmentation: in the preprocessed image, each character is segmented from continuous letter phrase based on a histogram projection algorithm, so that the character recognition accuracy is improved;
step S3033, feature extraction: extracting useful features from the preprocessed image of the character by using a Canny edge detection algorithm, wherein the features are used for representing the shape, outline and boundary information of the character;
step S3034, character recognition: the extracted features are converted into computer-processable feature vectors, which are used to identify characters using a convolutional neural network deep learning architecture.
In the further technical scheme of the invention, in the step S30, in which the file is identified by adopting a corresponding identification policy according to the type of the file, if the type of the file is a picture file, the step S50 is directly executed, and the OCR technology is adopted to extract the text content of the picture and the position of the text in the picture.
According to a further technical scheme of the invention, the steps of outputting the identification result in a text form for subsequent data processing and analysis comprise the following steps: and uploading the identification result to an elastic search for storage in a text form so as to facilitate retrieval by a system.
According to the technical scheme, the related search word recommendation and risk search word recommendation functions on the platform are realized by adopting an elastic search self-contained search algorithm, wherein the elastic search self-contained search algorithm is a BM25 algorithm.
To achieve the above object, the present invention also proposes a multimodal full text information retrieval system comprising a memory, a processor and a multimodal full text information retrieval program stored on the processor, which multimodal full text information retrieval program, when run by the processor, performs the steps of the method as described above.
To achieve the above object, the present invention also proposes a computer-readable storage medium storing a multi-modal full-text information retrieval program which, when executed by a processor, performs the steps of the method as described above.
The multi-mode full-text information retrieval method, system and storage medium have the beneficial effects that: the invention adopts the technical scheme that the method comprises the following steps: step S10, obtaining files of different types to be managed, wherein the types of the files comprise one or more of texts, pictures, audio or videos; step S20, judging the type of the file; step S30, identifying the file by adopting a corresponding identification strategy according to the type of the file; and step S40, outputting the identification result in a text form for subsequent data processing and analysis, so that various types of files can be quickly and accurately searched, more efficient searching and management of various types of files can be realized, the efficiency of automatic management of the files can be improved, and the operation cost of enterprises can be reduced.
Drawings
FIG. 1 is a flowchart of a multi-modal full-text information retrieval method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a second embodiment of a multi-modal full-text information retrieval method according to the present invention;
fig. 3 is a schematic diagram of a refinement flow of step S301 in fig. 2;
FIG. 4 is a flowchart of a third embodiment of a multi-modal full-text information retrieval method according to the present invention;
fig. 5 is a schematic diagram of a refinement flow of step S303 in fig. 4.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, the present invention provides a multi-modal full-text information retrieval method, and a first embodiment of the multi-modal full-text information retrieval method of the present invention includes the following steps:
step S10, obtaining files of different types to be managed, wherein the types of the files comprise one or more of texts, pictures, audio or videos.
And step S20, judging the type of the file.
And step S30, identifying the file by adopting a corresponding identification strategy according to the type of the file.
In this embodiment, for text files, all text contents are read by using a conventional manner in the prior art, for audio or video files, an ASR technique is used to identify speech contents with a result of relative time, for picture files, an OCR technique is used to extract the positions of text contents and characters in a picture, and for some document files that are not text or picture, the text contents and characters are converted into pictures, and then content identification is performed by using a picture identification manner, so as to obtain a result.
And step S40, outputting the identification result in a text form for subsequent data processing and analysis.
After the file content is successfully identified, it is uploaded to a corresponding data storage system for subsequent processing. In this embodiment, the identification result of the file is selected to be uploaded to the elastic search for saving, so that the system can search conveniently.
In addition, the embodiment mainly uses the search algorithm of the elastic search to realize the functions of relevant search word recommendation, risk search word recommendation and the like on the platform. The algorithm mainly used is: the BM25 algorithm (Best Matching 25) calculates a score based on word frequency and document frequency and ranks the scores. The BM25 algorithm assigns a score to each document that indicates the relevance of the document to the query.
By adopting the multi-mode full-text information retrieval method provided by the embodiment, a user only needs to upload various types of files which need to be managed to the multi-mode full-text information retrieval system. Even if a user accumulates a large amount of relevant files such as files, pictures and videos, the user can quickly and accurately search the required files when the user needs to search a certain file in a huge amount of the files, so that more efficient searching and management of the files are realized.
With the advent of the digital age, various documents will grow exponentially, and the multi-mode full-text information retrieval method provided by the embodiment can effectively improve the efficiency of file automatic management, reduce the operation cost of enterprises, and simultaneously reduce the workload of manual input of users, so that the work becomes easier and more efficient.
Further, referring to fig. 2, a second embodiment of the multi-mode full text information retrieval method according to the present invention is provided based on the first embodiment shown in fig. 1, and the difference between the present embodiment and the first embodiment shown in fig. 1 is that in the present embodiment, the step S20 of identifying the file by adopting a corresponding identification policy according to the type of the file includes:
in step S301, if the file type is an audio or video file, an ASR technique is used to perform speech content recognition on the file.
Specifically, as shown in fig. 3, in the embodiment, step S301, if the file type is an audio or video file, the step of performing speech content recognition on the file by using the ASR technology includes:
step S3011, preprocessing voice data: the audio-video file is unified into a sample rate 16k, mono audio data file using the open source tool ffmpeg.
Step S3012, feature extraction: the preprocessed speech signal is converted into feature vectors using MFCC feature extraction.
Step S3013, recognition: and identifying the characteristic vector sequence by using an acoustic model deep neural network model (DNN) and a Neural Network Language Model (NNLM), and finding out the most suitable text sequence, namely an identification result.
Further, referring to fig. 4, a third embodiment of the multi-modal full-text information retrieval method according to the present invention is provided based on the multi-modal full-text information retrieval method shown in fig. 1, and the difference between the present embodiment and the first embodiment shown in fig. 1 is that in the present embodiment, the step S30 of identifying the file by adopting a corresponding identification policy according to the type of the file includes:
step S302, if the file type is a document file which is not text or picture, converting the document file into a picture file;
step S303, the character content of the picture and the position of the character in the picture are extracted by adopting an OCR technology.
Specifically, referring to fig. 5, in step S303, the step of extracting the text content of the picture and the position of the text in the picture by using the OCR technology includes:
step S3031, image preprocessing: the original image is converted into a format suitable for feature extraction and character recognition by performing light correction, noise removal, convolution smoothing, and image binarization processing on the image.
Step S3032, character segmentation: in the preprocessed image, each character is segmented from a continuous alphabetic phrase based on a histogram projection algorithm to improve the accuracy of character recognition.
Step S3033, feature extraction: useful features are extracted from the preprocessed image of the character using the Canny edge detection algorithm, and are used to represent the shape, outline, and boundary information of the character.
Step S3034, character recognition: the extracted features are converted into computer-processable feature vectors, which are used to identify characters using a Convolutional Neural Network (CNN) deep learning architecture.
Referring to fig. 4, in the embodiment, in the step S30, a corresponding recognition policy is adopted to recognize the file according to the type of the file, if the type of the file is a picture file, the step S50 is directly executed, and the OCR technology is adopted to extract the text content of the picture and the position of the text in the picture.
The multi-mode full-text information retrieval method has the beneficial effects that: the invention adopts the technical scheme that the method comprises the following steps: step S10, obtaining files of different types to be managed, wherein the types of the files comprise one or more of texts, pictures, audio or videos; step S20, judging the type of the file; step S30, identifying the file by adopting a corresponding identification strategy according to the type of the file; and step S40, outputting the identification result in a text form for subsequent data processing and analysis, so that various types of files can be quickly and accurately searched, more efficient searching and management of various types of files can be realized, the efficiency of automatic management of the files can be improved, and the operation cost of enterprises can be reduced.
In order to achieve the above objective, the present invention further provides a multi-modal full text information retrieval system, where the system includes a memory, a processor, and a multi-modal full text information retrieval program stored on the processor, and the steps of the method described in the above embodiments are executed by the processor when the multi-modal full text information retrieval program is executed by the processor, which is not repeated herein.
To achieve the above objective, the present invention further provides a computer readable storage medium, where a multi-modal full-text information retrieval program is stored, and the steps of the method described in the above embodiments are executed by a processor when the multi-modal full-text information retrieval program is executed by the processor, which is not repeated herein.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes using the descriptions and drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the invention.
Claims (10)
1. A multi-modal full text information retrieval method, the method comprising the steps of:
step S10, obtaining files of different types to be managed, wherein the types of the files comprise one or more of texts, pictures, audio or videos;
step S20, judging the type of the file;
step S30, identifying the file by adopting a corresponding identification strategy according to the type of the file;
and step S40, outputting the identification result in a text form for subsequent data processing and analysis.
2. A multi-modal full-text information retrieval method as set forth in claim 1 wherein the step of identifying the document using a corresponding identification policy based on the type of the document includes:
in step S301, if the file type is an audio or video file, an ASR technique is used to perform speech content recognition on the file.
3. A multimodal full text information retrieval method as claimed in claim 2 wherein step S301, if the type of the document is an audio or video document, the step of performing speech content recognition on the document using ASR techniques comprises:
step S3011, preprocessing voice data: unifying the audio-video file into a sample rate 16k, mono audio data file using an open source tool ffmpeg;
step S3012, feature extraction: converting the preprocessed voice signal into a feature vector by adopting MFCC feature extraction;
step S3013, recognition: and identifying the feature vector sequence by using an acoustic model deep neural network model and a neural network language model, and finding out the most suitable text sequence, namely an identification result.
4. A multi-modal full-text information retrieval method as set forth in claim 1 wherein the step of identifying the document using a corresponding identification policy based on the type of the document includes:
step S302, if the file type is a document file which is not text or picture, converting the document file into a picture file;
step S303, the character content of the picture and the position of the character in the picture are extracted by adopting an OCR technology.
5. The method of claim 4, wherein the step of extracting the text content of the picture and the text position in the picture by using OCR technology in step S303 comprises:
step S3031, image preprocessing: the original image is converted into a format suitable for feature extraction and character recognition through carrying out light correction, noise removal, convolution smoothing and image binarization processing on the image;
step S3032, character segmentation: in the preprocessed image, each character is segmented from continuous letter phrase based on a histogram projection algorithm, so that the character recognition accuracy is improved;
step S3033, feature extraction: extracting useful features from the preprocessed image of the character by using a Canny edge detection algorithm, wherein the features are used for representing the shape, outline and boundary information of the character;
step S3034, character recognition: the extracted features are converted into computer-processable feature vectors, which are used to identify characters using a convolutional neural network deep learning architecture.
6. The method of claim 5, wherein in the step of identifying the file by using a corresponding identification policy according to the type of the file in step S30, if the type of the file is a picture file, the step S50 is directly executed, and the OCR technology is used to extract the positions of the text content and the text of the picture in the figure.
7. A multimodal full text information retrieval method as claimed in any one of claims 1 to 6 wherein said step of outputting the recognition results in text form for subsequent data processing and analysis includes: and uploading the identification result to an elastic search for storage in a text form so as to facilitate retrieval by a system.
8. The multi-modal full-text information retrieval method according to claim 7, wherein the related search word recommendation and risk search word recommendation functions on the platform are realized by adopting an elastic search self-contained retrieval algorithm, wherein the elastic search self-contained retrieval algorithm is a BM25 algorithm.
9. A multimodal full text information retrieval system, the system comprising a memory, a processor, and a multimodal full text information retrieval program stored on the processor, which when executed by the processor performs the steps of the method of any of claims 1 to 8.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a multimodal full text information retrieval program which, when executed by a processor, performs the steps of the method according to any of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310474800.6A CN116644228A (en) | 2023-04-26 | 2023-04-26 | Multi-mode full text information retrieval method, system and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310474800.6A CN116644228A (en) | 2023-04-26 | 2023-04-26 | Multi-mode full text information retrieval method, system and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116644228A true CN116644228A (en) | 2023-08-25 |
Family
ID=87619405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310474800.6A Pending CN116644228A (en) | 2023-04-26 | 2023-04-26 | Multi-mode full text information retrieval method, system and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116644228A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117033308A (en) * | 2023-08-28 | 2023-11-10 | 中国电子科技集团公司第十五研究所 | Multi-mode retrieval method and device based on specific range |
-
2023
- 2023-04-26 CN CN202310474800.6A patent/CN116644228A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117033308A (en) * | 2023-08-28 | 2023-11-10 | 中国电子科技集团公司第十五研究所 | Multi-mode retrieval method and device based on specific range |
CN117033308B (en) * | 2023-08-28 | 2024-03-26 | 中国电子科技集团公司第十五研究所 | Multi-mode retrieval method and device based on specific range |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107480200B (en) | Word labeling method, device, server and storage medium based on word labels | |
US10572528B2 (en) | System and method for automatic detection and clustering of articles using multimedia information | |
Zagoris et al. | A document image retrieval system | |
CN1748213A (en) | Method and apparatus for content representation and retrieval in concept model space | |
US7739110B2 (en) | Multimedia data management by speech recognizer annotation | |
WO2023065617A1 (en) | Cross-modal retrieval system and method based on pre-training model and recall and ranking | |
CN111276149B (en) | Voice recognition method, device, equipment and readable storage medium | |
CN112004164B (en) | Automatic video poster generation method | |
CN114416979A (en) | Text query method, text query equipment and storage medium | |
CN116644228A (en) | Multi-mode full text information retrieval method, system and storage medium | |
CN111291168A (en) | Book retrieval method and device and readable storage medium | |
CN117010500A (en) | Visual knowledge reasoning question-answering method based on multi-source heterogeneous knowledge joint enhancement | |
CN110795942A (en) | Keyword determination method and device based on semantic recognition and storage medium | |
CN116881463B (en) | Artistic multi-mode corpus construction system based on data | |
CN109684357B (en) | Information processing method and device, storage medium and terminal | |
KR101800975B1 (en) | Sharing method and apparatus of the handwriting recognition is generated electronic documents | |
CN115203474A (en) | Automatic database classification and extraction technology | |
CN114780757A (en) | Short media label extraction method and device, computer equipment and storage medium | |
CN113743352A (en) | Method and device for comparing similarity of video contents | |
CN108882033B (en) | Character recognition method, device, equipment and medium based on video voice | |
CN113297485A (en) | Method for generating cross-modal representation vector and cross-modal recommendation method | |
CN114764437A (en) | User intention identification method and device and electronic equipment | |
Khollam et al. | A survey on content based lecture video retrieval using speech and video text information | |
CN110717091B (en) | Entry data expansion method and device based on face recognition | |
CN117493645B (en) | Big data-based electronic archive recommendation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |