CN116644228A

CN116644228A - Multi-mode full text information retrieval method, system and storage medium

Info

Publication number: CN116644228A
Application number: CN202310474800.6A
Authority: CN
Inventors: 刘兆武; 冯漪; 凌霏
Original assignee: Shenzhen Craftsman Network Technology Co ltd
Current assignee: Shenzhen Craftsman Network Technology Co ltd
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2023-08-25

Abstract

The invention discloses a multi-mode full text information retrieval method, a system and a storage medium, wherein the method comprises the following steps: obtaining files of different types to be managed, wherein the types of the files comprise one or more of texts, pictures, audio or video; judging the type of the file; identifying the file by adopting a corresponding identification strategy according to the type of the file; and outputting the identification result in a text form for subsequent data processing and analysis. The invention realizes quick and accurate retrieval of various types of files, realizes more efficient searching and management of various types of files, improves the efficiency of automatic management of files, and reduces the operation cost of enterprises.

Description

Multi-mode full text information retrieval method, system and storage medium

Technical Field

The present invention relates to the field of full text retrieval for multi-modal content, and in particular, to a method, a system, and a storage medium for retrieving multi-modal full text information.

Background

Full text retrieval is a technique for finding a particular word or phrase from a collection of documents. It is a key technology in the digital information age for rapid retrieval of large amounts of text content. If not all keywords are known, full text retrieval techniques can help quickly find the desired information.

Currently mainly used in some of the following common scenarios:

the search function of the e-commerce platform helps users to quickly find required commodities;

the article searching function of the news media website enables a user to search all relevant news according to keywords;

and the searching function of the social media platform enables the user to search all relevant contents such as users, posts, comments and the like according to the keywords.

Full text retrieval systems are currently on the market that are spread around text.

Full text retrieval technology is mature, and various text data can be effectively processed by the full text retrieval system on the market at present. However, with the development of the information age, the amount of data generated by people is continuously increasing, and the types of data of materials are more diversified. For non-text content when processing such data, for example: retrieval of video, audio, images, etc. is still difficult at the present stage and still requires manual processing and searching.

Disclosure of Invention

The invention mainly aims to provide a multi-mode full-text information retrieval method, a system and a storage medium, which aim to quickly and accurately retrieve various types of files, realize more efficient searching and management of the various types of files, improve the efficiency of automatic management of the files and reduce the operation cost of enterprises.

In order to achieve the above object, the present invention provides a multi-modal full text information retrieval method, the method comprising the steps of:

step S10, obtaining files of different types to be managed, wherein the types of the files comprise one or more of texts, pictures, audio or videos;

step S20, judging the type of the file;

step S30, identifying the file by adopting a corresponding identification strategy according to the type of the file;

and step S40, outputting the identification result in a text form for subsequent data processing and analysis.

According to a further technical scheme of the present invention, the step S30 of identifying the file by adopting a corresponding identification policy according to the type of the file includes:

in step S301, if the file type is an audio or video file, an ASR technique is used to perform speech content recognition on the file.

In a further technical scheme of the present invention, if the file type is an audio or video file, the step of performing speech content recognition on the file by using an ASR technique includes:

step S3011, preprocessing voice data: unifying the audio-video file into a sample rate 16k, mono audio data file using an open source tool ffmpeg;

step S3012, feature extraction: converting the preprocessed voice signal into a feature vector by adopting MFCC feature extraction;

step S3013, recognition: and identifying the feature vector sequence by using an acoustic model deep neural network model and a neural network language model, and finding out the most suitable text sequence, namely an identification result.

step S302, if the file type is a document file which is not text or picture, converting the document file into a picture file;

step S303, the character content of the picture and the position of the character in the picture are extracted by adopting an OCR technology.

The further technical scheme of the present invention is that, the step S303, the step of extracting the text content of the picture and the position of the text in the picture by using the OCR technology, includes:

step S3031, image preprocessing: the original image is converted into a format suitable for feature extraction and character recognition through carrying out light correction, noise removal, convolution smoothing and image binarization processing on the image;

step S3032, character segmentation: in the preprocessed image, each character is segmented from continuous letter phrase based on a histogram projection algorithm, so that the character recognition accuracy is improved;

step S3033, feature extraction: extracting useful features from the preprocessed image of the character by using a Canny edge detection algorithm, wherein the features are used for representing the shape, outline and boundary information of the character;

step S3034, character recognition: the extracted features are converted into computer-processable feature vectors, which are used to identify characters using a convolutional neural network deep learning architecture.

In the further technical scheme of the invention, in the step S30, in which the file is identified by adopting a corresponding identification policy according to the type of the file, if the type of the file is a picture file, the step S50 is directly executed, and the OCR technology is adopted to extract the text content of the picture and the position of the text in the picture.

According to a further technical scheme of the invention, the steps of outputting the identification result in a text form for subsequent data processing and analysis comprise the following steps: and uploading the identification result to an elastic search for storage in a text form so as to facilitate retrieval by a system.

According to the technical scheme, the related search word recommendation and risk search word recommendation functions on the platform are realized by adopting an elastic search self-contained search algorithm, wherein the elastic search self-contained search algorithm is a BM25 algorithm.

To achieve the above object, the present invention also proposes a multimodal full text information retrieval system comprising a memory, a processor and a multimodal full text information retrieval program stored on the processor, which multimodal full text information retrieval program, when run by the processor, performs the steps of the method as described above.

To achieve the above object, the present invention also proposes a computer-readable storage medium storing a multi-modal full-text information retrieval program which, when executed by a processor, performs the steps of the method as described above.

The multi-mode full-text information retrieval method, system and storage medium have the beneficial effects that: the invention adopts the technical scheme that the method comprises the following steps: step S10, obtaining files of different types to be managed, wherein the types of the files comprise one or more of texts, pictures, audio or videos; step S20, judging the type of the file; step S30, identifying the file by adopting a corresponding identification strategy according to the type of the file; and step S40, outputting the identification result in a text form for subsequent data processing and analysis, so that various types of files can be quickly and accurately searched, more efficient searching and management of various types of files can be realized, the efficiency of automatic management of the files can be improved, and the operation cost of enterprises can be reduced.

Drawings

FIG. 1 is a flowchart of a multi-modal full-text information retrieval method according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a second embodiment of a multi-modal full-text information retrieval method according to the present invention;

fig. 3 is a schematic diagram of a refinement flow of step S301 in fig. 2;

FIG. 4 is a flowchart of a third embodiment of a multi-modal full-text information retrieval method according to the present invention;

fig. 5 is a schematic diagram of a refinement flow of step S303 in fig. 4.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, the present invention provides a multi-modal full-text information retrieval method, and a first embodiment of the multi-modal full-text information retrieval method of the present invention includes the following steps:

step S10, obtaining files of different types to be managed, wherein the types of the files comprise one or more of texts, pictures, audio or videos.

And step S20, judging the type of the file.

And step S30, identifying the file by adopting a corresponding identification strategy according to the type of the file.

In this embodiment, for text files, all text contents are read by using a conventional manner in the prior art, for audio or video files, an ASR technique is used to identify speech contents with a result of relative time, for picture files, an OCR technique is used to extract the positions of text contents and characters in a picture, and for some document files that are not text or picture, the text contents and characters are converted into pictures, and then content identification is performed by using a picture identification manner, so as to obtain a result.

After the file content is successfully identified, it is uploaded to a corresponding data storage system for subsequent processing. In this embodiment, the identification result of the file is selected to be uploaded to the elastic search for saving, so that the system can search conveniently.

In addition, the embodiment mainly uses the search algorithm of the elastic search to realize the functions of relevant search word recommendation, risk search word recommendation and the like on the platform. The algorithm mainly used is: the BM25 algorithm (Best Matching 25) calculates a score based on word frequency and document frequency and ranks the scores. The BM25 algorithm assigns a score to each document that indicates the relevance of the document to the query.

By adopting the multi-mode full-text information retrieval method provided by the embodiment, a user only needs to upload various types of files which need to be managed to the multi-mode full-text information retrieval system. Even if a user accumulates a large amount of relevant files such as files, pictures and videos, the user can quickly and accurately search the required files when the user needs to search a certain file in a huge amount of the files, so that more efficient searching and management of the files are realized.

With the advent of the digital age, various documents will grow exponentially, and the multi-mode full-text information retrieval method provided by the embodiment can effectively improve the efficiency of file automatic management, reduce the operation cost of enterprises, and simultaneously reduce the workload of manual input of users, so that the work becomes easier and more efficient.

Further, referring to fig. 2, a second embodiment of the multi-mode full text information retrieval method according to the present invention is provided based on the first embodiment shown in fig. 1, and the difference between the present embodiment and the first embodiment shown in fig. 1 is that in the present embodiment, the step S20 of identifying the file by adopting a corresponding identification policy according to the type of the file includes:

Specifically, as shown in fig. 3, in the embodiment, step S301, if the file type is an audio or video file, the step of performing speech content recognition on the file by using the ASR technology includes:

step S3011, preprocessing voice data: the audio-video file is unified into a sample rate 16k, mono audio data file using the open source tool ffmpeg.

Step S3012, feature extraction: the preprocessed speech signal is converted into feature vectors using MFCC feature extraction.

Step S3013, recognition: and identifying the characteristic vector sequence by using an acoustic model deep neural network model (DNN) and a Neural Network Language Model (NNLM), and finding out the most suitable text sequence, namely an identification result.

Further, referring to fig. 4, a third embodiment of the multi-modal full-text information retrieval method according to the present invention is provided based on the multi-modal full-text information retrieval method shown in fig. 1, and the difference between the present embodiment and the first embodiment shown in fig. 1 is that in the present embodiment, the step S30 of identifying the file by adopting a corresponding identification policy according to the type of the file includes:

Specifically, referring to fig. 5, in step S303, the step of extracting the text content of the picture and the position of the text in the picture by using the OCR technology includes:

step S3031, image preprocessing: the original image is converted into a format suitable for feature extraction and character recognition by performing light correction, noise removal, convolution smoothing, and image binarization processing on the image.

Step S3032, character segmentation: in the preprocessed image, each character is segmented from a continuous alphabetic phrase based on a histogram projection algorithm to improve the accuracy of character recognition.

Step S3033, feature extraction: useful features are extracted from the preprocessed image of the character using the Canny edge detection algorithm, and are used to represent the shape, outline, and boundary information of the character.

Step S3034, character recognition: the extracted features are converted into computer-processable feature vectors, which are used to identify characters using a Convolutional Neural Network (CNN) deep learning architecture.

Referring to fig. 4, in the embodiment, in the step S30, a corresponding recognition policy is adopted to recognize the file according to the type of the file, if the type of the file is a picture file, the step S50 is directly executed, and the OCR technology is adopted to extract the text content of the picture and the position of the text in the picture.

The multi-mode full-text information retrieval method has the beneficial effects that: the invention adopts the technical scheme that the method comprises the following steps: step S10, obtaining files of different types to be managed, wherein the types of the files comprise one or more of texts, pictures, audio or videos; step S20, judging the type of the file; step S30, identifying the file by adopting a corresponding identification strategy according to the type of the file; and step S40, outputting the identification result in a text form for subsequent data processing and analysis, so that various types of files can be quickly and accurately searched, more efficient searching and management of various types of files can be realized, the efficiency of automatic management of the files can be improved, and the operation cost of enterprises can be reduced.

In order to achieve the above objective, the present invention further provides a multi-modal full text information retrieval system, where the system includes a memory, a processor, and a multi-modal full text information retrieval program stored on the processor, and the steps of the method described in the above embodiments are executed by the processor when the multi-modal full text information retrieval program is executed by the processor, which is not repeated herein.

To achieve the above objective, the present invention further provides a computer readable storage medium, where a multi-modal full-text information retrieval program is stored, and the steps of the method described in the above embodiments are executed by a processor when the multi-modal full-text information retrieval program is executed by the processor, which is not repeated herein.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes using the descriptions and drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the invention.

Claims

1. A multi-modal full text information retrieval method, the method comprising the steps of:

step S20, judging the type of the file;

2. A multi-modal full-text information retrieval method as set forth in claim 1 wherein the step of identifying the document using a corresponding identification policy based on the type of the document includes:

3. A multimodal full text information retrieval method as claimed in claim 2 wherein step S301, if the type of the document is an audio or video document, the step of performing speech content recognition on the document using ASR techniques comprises:

4. A multi-modal full-text information retrieval method as set forth in claim 1 wherein the step of identifying the document using a corresponding identification policy based on the type of the document includes:

5. The method of claim 4, wherein the step of extracting the text content of the picture and the text position in the picture by using OCR technology in step S303 comprises:

6. The method of claim 5, wherein in the step of identifying the file by using a corresponding identification policy according to the type of the file in step S30, if the type of the file is a picture file, the step S50 is directly executed, and the OCR technology is used to extract the positions of the text content and the text of the picture in the figure.

7. A multimodal full text information retrieval method as claimed in any one of claims 1 to 6 wherein said step of outputting the recognition results in text form for subsequent data processing and analysis includes: and uploading the identification result to an elastic search for storage in a text form so as to facilitate retrieval by a system.

8. The multi-modal full-text information retrieval method according to claim 7, wherein the related search word recommendation and risk search word recommendation functions on the platform are realized by adopting an elastic search self-contained retrieval algorithm, wherein the elastic search self-contained retrieval algorithm is a BM25 algorithm.

9. A multimodal full text information retrieval system, the system comprising a memory, a processor, and a multimodal full text information retrieval program stored on the processor, which when executed by the processor performs the steps of the method of any of claims 1 to 8.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a multimodal full text information retrieval program which, when executed by a processor, performs the steps of the method according to any of claims 1 to 8.