CN112765394A

CN112765394A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN112765394A
Application number: CN202110018681.4A
Authority: CN
Inventors: 李帅辰
Original assignee: Shanghai Xiri Electronic Technology Co ltd
Current assignee: Shanghai Xiri Electronic Technology Co ltd
Priority date: 2021-01-07
Filing date: 2021-01-07
Publication date: 2021-05-07

Abstract

The application provides a data processing method, a data processing device, electronic equipment and a storage medium, and relates to the technical field of Internet.A server records an audio file library and stores a plurality of images and audio files corresponding to the images in the audio file library, so that after receiving an image to be matched sent by terminal equipment, the server can search a target image corresponding to the image to be matched in the audio file library and further send out the target audio file corresponding to the target image; therefore, in the process of searching the audio book, the user does not need to screen the search result, and the search precision can be improved.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.

Background

In some listening and speaking scenes, a user can search some audio books to acquire contents in a mode of playing the audio books without turning over paper books or electronic books page by page.

When searching for the audio book, the user is generally required to input the labels such as the book name of the desired audio book manually or by voice, so as to perform fuzzy search on the audio book.

However, when searching for an audio reading material, inaccurate searching may occur, resulting in low searching precision.

Disclosure of Invention

The application aims to provide a data processing method, a data processing device, an electronic device and a storage medium, which can improve the search precision.

In order to achieve the purpose, the technical scheme adopted by the application is as follows:

in a first aspect, the present application provides a data processing method, which is applied to a server, where the server records an audio file library, and the audio file library includes a plurality of images and an audio file corresponding to each image; the server establishes communication with a terminal device;

the method comprises the following steps:

receiving an image to be matched sent by the terminal equipment;

searching a target image corresponding to the image to be matched in the audio file library;

and sending out the target audio file corresponding to the target image.

In a second aspect, the present application provides a data processing method, which is applied to a terminal device, where the terminal device establishes communication with a server; the method comprises the following steps:

acquiring an image to be matched;

sending an audio acquisition request to the server; the audio acquisition request comprises the image to be matched, and the audio acquisition request is used for indicating the server to feed back a target audio file corresponding to the image to be matched;

and receiving the target audio file sent by the server.

In a third aspect, the present application provides a data processing apparatus, which is applied to a server, where the server records an audio file library, and the audio file library includes a plurality of images and an audio file corresponding to each image; the server establishes communication with a terminal device;

the device comprises:

the first transceiving module is used for receiving the image to be matched sent by the terminal equipment;

the first processing module is used for searching a target image corresponding to the image to be matched in the audio file library;

the first transceiver module is further configured to send out a target audio file corresponding to the target image.

In a fourth aspect, the present application provides a data processing apparatus, which is applied to a terminal device, where the terminal device establishes communication with a server; the device comprises:

the second processing module is used for acquiring an image to be matched;

the second transceiver module is used for sending an audio acquisition request to the server; the audio acquisition request comprises the image to be matched, and the audio acquisition request is used for indicating the server to feed back a target audio file corresponding to the image to be matched;

the second transceiver module is further configured to receive a target audio file sent by the server.

In a fifth aspect, the present application provides an electronic device comprising a memory for storing one or more programs; a processor; the one or more programs, when executed by the processor, implement the data processing method provided by the first aspect or the data processing method provided by the second aspect.

In a sixth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the data processing method provided in the first aspect described above or the data processing method provided in the second aspect.

According to the data processing method, the data processing device, the electronic equipment and the storage medium, an audio file library is recorded in the server, and a plurality of images and audio files corresponding to the images are stored in the audio file library, so that after the server receives the image to be matched sent by the terminal equipment, a target image corresponding to the image to be matched can be searched in the audio file library, and then a target audio file corresponding to the target image is sent out; therefore, in the process of searching the audio book, the user does not need to screen the search result, and the search precision can be improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly explain the technical solutions of the present application, the drawings needed for the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also derive other related drawings from these drawings without inventive effort.

Fig. 1 shows a schematic application scenario diagram of the data processing method provided in the present application.

Fig. 2 shows a schematic structural block diagram of an electronic device provided by the present application.

Fig. 3 shows a schematic flowchart of a data processing method applied to a terminal device provided in the present application.

Fig. 4 shows a schematic flowchart of a data processing method applied to a server provided by the present application.

Fig. 5 shows an exemplary flowchart of the substeps of step 302 in fig. 4.

Fig. 6 shows a schematic block diagram of a first data processing apparatus provided in the present application.

Fig. 7 shows a schematic block diagram of a second data processing apparatus provided in the present application.

In the figure: 100-an electronic device; 101-a memory; 102-a processor; 103-a communication interface; 400-a first data processing apparatus; 401-a first transceiver module; 402-a first processing module; 500-a second data processing apparatus; 501-a second processing module; 502-a second transceiver module.

Detailed Description

To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the accompanying drawings in some embodiments of the present application, and it is obvious that the described embodiments are some, but not all embodiments of the present application. The components of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments obtained by a person of ordinary skill in the art based on a part of the embodiments in the present application without any creative effort belong to the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In the above scenario of searching for audio books, some schemes perform fuzzy search by matching the book name of the book with a search engine, and for some popular or popular books, the schemes can be found quickly and efficiently.

However, for some books with a certain name, a little number or a little popularity, the search frequency is not high, so that the user needs to spend more time on screening and searching the search results, and the search precision is low.

Therefore, based on the drawbacks of the above search scheme, some possible embodiments provided by the present application are: recording an audio file library in a server, and storing a plurality of images and audio files corresponding to the images in the audio file library, so that after receiving an image to be matched sent by a terminal device, the server can search a target image corresponding to the image to be matched in the audio file library, and then send out a target audio file corresponding to the target image; therefore, in the process of searching the audio book, the user does not need to screen the search result, and the search precision can be improved.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic application scenario diagram illustrating a data processing method provided in the present application, in some embodiments of the present application, a server and a terminal device may be located in a wireless network or a wired network, and the server and the terminal device perform data interaction through the wireless network or the wired network.

In some embodiments of the present application, the terminal device may be a device with an image capturing function, for example, the terminal device may be a smart phone, a Personal Computer (PC), a tablet computer, a wearable mobile terminal, a smart speaker with an image capturing function, and the like.

The application provides a data processing method which can be applied to a server as shown in fig. 1, and the server can send out an audio file required by a user by executing the data processing method, thereby providing a search service of audio reading materials for the user.

In addition, the application also provides a data processing method which can be applied to the terminal equipment shown in fig. 1, and the terminal equipment can send the acquired image to be matched to the server by executing the data processing method, and receive the audio file which is fed back by the server and corresponds to the image to be matched, so that the requirement of searching the audio book by the user is met.

Referring to fig. 2, fig. 2 shows a schematic block diagram of an electronic device 100 provided in the present application, where the electronic device 100 may be used as the server in fig. 1, and may also be used as the terminal device in fig. 1.

In some embodiments, the electronic device 100 may include a memory 101, a processor 102, and a communication interface 103, wherein the memory 101, the processor 102, and the communication interface 103 are electrically connected to each other, directly or indirectly, to enable transmission or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

The memory 101 may be used to store software programs and modules, such as program instructions/modules corresponding to the first data processing apparatus or the second data processing apparatus provided in the present application, and the processor 102 executes the software programs and modules stored in the memory 101 to execute various functional applications and data processing, thereby executing the data processing method applied to the server or the data processing method applied to the terminal device provided in the present application. The communication interface 103 may be used for communicating signaling or data with other node devices.

The Memory 101 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Programmable Read-Only Memory (EEPROM), and the like.

The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

It will be appreciated that the configuration shown in fig. 2 is merely illustrative and that electronic device 100 may include more or fewer components than shown in fig. 2 or may have a different configuration than shown in fig. 2. The components shown in fig. 2 may be implemented in hardware, software, or a combination thereof.

Taking the electronic device 100 shown in fig. 2 as the terminal device in fig. 1 as an example, a data processing method applied to the terminal device provided by the present application is exemplarily described below.

Referring to fig. 3, fig. 3 shows a schematic flow chart of a data processing method applied to a terminal device provided in the present application, and in some embodiments, the data processing method applied to the terminal device may include the following steps:

step 201, obtaining an image to be matched.

Step 202, sending an audio acquisition request to a server.

Step 203, receiving the target audio file sent by the server.

In some embodiments, a user may operate the terminal device to perform photographing, image receiving, or web page search, and obtain an image to be matched, where the image to be matched may include information of an audio book, such as a book name, an author, a publisher, a publication date, and the like of the audio book that the user needs to search.

It will be appreciated that although the title of the different entity readings may coincide, the cover pages of the different entity readings will typically be different, and thus, the different entity readings may be distinguished using different images.

Based on this, the terminal device may send an audio obtaining request to the server based on the obtained image to be matched, where the audio obtaining request may include the image to be matched, and the audio obtaining request may be used to instruct the server to feed back a target audio file corresponding to the image to be matched.

Next, the terminal device may obtain the audio reading required by the user by receiving the target audio file sent by the server, so as to play the target audio file through the terminal device itself or a connected playing end to obtain the content.

In some possible embodiments, taking an example that a user performs photographing of an image to be matched by operating a terminal device, the terminal device may continuously acquire a plurality of frames of video images in a manner of photographing a video stream in the process of performing step 201, and acquire the image to be matched from the acquired plurality of frames of video images.

For example, in some embodiments, the shape of the entity book is generally a rectangle, and the entity book is generally represented as a closed rectangular frame in the image, so that the terminal device may use a preset closed rectangular frame as a detector, sequentially take each frame of video image in the video stream as a target video frame, and detect whether there is a closed rectangular frame in the target video frame, where the closed rectangular frame may be represented as four line segments connected in sequence in the video image; if the closed rectangular frame exists in the target video frame, the terminal equipment can determine the target video frame as an image to be matched; on the contrary, if the closed rectangular frame does not exist in the target video frame, the video image of the next frame of the target video frame in the video stream can be used as a new target video frame, and the step of detecting whether the closed rectangular frame exists in the target video frame is continuously executed until the video image with the closed rectangular frame is determined as the image to be matched.

In addition, in some embodiments, in order to improve the accuracy of the terminal device in detecting whether the closed rectangular frame exists in the target video frame, the terminal device may further perform an image enhancement operation on the target video frame, for example, the image enhancement operation may be performed on the target video frame by using at least one of gaussian blur, noise removal, canny edge detection operator, edge detection, dilation operation, edge enhancement, binarization processing, and the like.

Based on the manner that the user sends the image to be matched to the server through the terminal device and receives the target audio file corresponding to the image to be matched from the server, on the contrary, on the server side, when the server receives the image to be matched sent by the terminal device, the terminal device can search the target audio file corresponding to the image to be matched in the audio file library recorded by the server based on the image to be matched and send the target audio file.

Illustratively, referring to fig. 4, fig. 4 shows a schematic flow chart of the data processing method applied to the server provided by the present application, and in some embodiments, the data processing method applied to the server may include the following steps:

step 301, receiving an image to be matched sent by a terminal device.

Step 302, searching a target image corresponding to the image to be matched in the audio file library.

Step 303, sending out the target audio file corresponding to the target image.

In some embodiments, the server may record an audio file library, where the audio file library includes a plurality of images and an audio file corresponding to each image; for example, the audio file library may store audio files corresponding to a plurality of audio books, store feature images (for example, images of cover pages) of the audio books, and establish an index relationship between each feature image and the corresponding audio file.

Based on this, when the server receives the image to be matched sent by the terminal device, the server may search the audio file library for the target image corresponding to the image to be matched in a manner of calculating similarity, so as to send the target audio file corresponding to the target image as the audio book required by the user, for example, to the terminal device, or to another device, so that the user may obtain the content based on the target audio file.

Therefore, based on the scheme provided by the application, an audio file library is recorded in the server, and a plurality of images and audio files corresponding to the images are stored in the audio file library, so that after receiving the image to be matched sent by the terminal equipment, the server can search the target image corresponding to the image to be matched in the audio file library, and then send out the target audio file corresponding to the target image; therefore, in the process of searching the audio book, the user does not need to screen the search result, and the search precision can be improved.

Some digital images are generally stored by using an RGB (Red-Green-Blue ) image, the RGB image is three-channel data, and when being stored, the RGB image generally needs to occupy more storage space; in addition, in the process of executing step 302 to obtain the target image, if the original image data is directly processed, the amount of image data is large, and thus, a large amount of calculation time is consumed, so that the search efficiency is reduced.

Therefore, in some embodiments, when the server stores a plurality of images through the audio file library, the server may extract feature information of each image by using a pre-trained model such as vgg16, and represent each image by using a feature vector L of a certain dimension (for example, 500 × 1 dimension), so that each image is stored in the form of the feature vector L to reduce the storage amount of data; in addition, on the basis of the extracted feature vector L of each image, the book identification book _ id and the book name book _ title corresponding to each image are spliced to form a data model file model corresponding to each image, and each image is stored in an audio file library of the server in the form of the data model file model.

Based on this, referring to fig. 5 on the basis of fig. 4, fig. 5 shows an exemplary flowchart of the sub-steps of step 302 in fig. 4, and in some possible embodiments, step 302 may include the following sub-steps:

step 302-1, extracting the feature information to be matched corresponding to the image to be matched.

And step 302-2, comparing the characteristic information to be matched with the characteristic information corresponding to each image in the audio file library, and screening out an image set to be searched corresponding to the image to be matched.

And step 302-3, searching a target image corresponding to the image to be matched in the image set to be searched.

In some embodiments, based on the manner of saving the image by using the feature vector, the server may firstly process the image to be matched by using, for example, the trained vgg16 model in the process of performing step 302, so as to extract the feature information to be matched corresponding to the image to be matched.

Then, the server compares the extracted image information to be matched with the characteristic information corresponding to each image in the audio file library, so that the image to be searched corresponding to the image to be matched is selected in the audio file library. Wherein, each image included in the image set to be searched is derived from the audio file library, namely: the image set to be searched is a subset of the audio file library.

For example, in some embodiments, in the process of screening out the image set to be searched, the server may first calculate the feature similarity between the image to be matched and each image in the audio file library by using a method such as vector cosine, according to the feature information corresponding to the feature information to be matched and each image in the audio file library.

For example, the calculation formula of the feature similarity corresponding to each image can be expressed as:

in the formula, P_iRepresenting the feature similarity corresponding to the ith image in the audio file library; feature information to be matched corresponding to the image to be matched is represented by featurel, namely vector representation of the image to be matched is represented; feature (i) represents the feature information corresponding to the ith image in the audio file library, i.e.A vector representation of the ith image; | | | represents the two-norm of the feature vector.

It can be understood that the above implementation manner is only an example, and a cosine calculation manner is adopted to calculate the feature similarity corresponding to each image in the image to be matched and the audio file library; in some other possible embodiments of the present application, the server may further calculate, by using another policy, a feature similarity between the image to be matched and each image in the audio file library, which is not limited in the present application.

Then, based on the calculated feature similarity between the image to be matched and each image in the audio file library, the server may use a set first threshold as a determination criterion, and screen out an image set to be searched corresponding to the image to be matched from all images whose corresponding feature similarity reaches the first threshold.

For example, in some possible embodiments, the server may combine all images whose corresponding feature similarity reaches the first threshold into a set of images to be searched corresponding to the images to be matched.

Of course, it is understood that, in some embodiments, in order to avoid that the image set to be searched includes too many images, which results in low calculation efficiency when subsequently calculating the similarity using the image data, the server may further compare the number of images whose corresponding feature similarities reach the first threshold with the second threshold based on a set second threshold as a determination criterion; if the number of the images of which the corresponding feature similarity reaches the first threshold is larger than a second threshold, the server selects a second threshold number of images to form an image set to be searched according to the sequence of the feature similarity from high to low; and if the number of the images of which the corresponding feature similarity reaches the first threshold is less than or equal to the second threshold, forming an image set to be searched corresponding to the image to be matched by all the images of which the corresponding feature similarity reaches the first threshold.

For example, assuming that the second threshold is 80, if the number of images whose corresponding feature similarities reach the first threshold is greater than 80, the server may select, according to the feature similarities from high to low, all images whose corresponding feature similarities are arranged in the top 80 to construct an image set to be searched; on the contrary, if the number of the images with the corresponding feature similarity reaching the first threshold is not more than 80, the server may construct all the images with the corresponding feature similarity reaching the first threshold as the image set to be searched. As can be seen, the second threshold is the truncation number of the image set to be searched, and the number of images included in the image set to be searched does not exceed the second threshold.

It can be understood that the process of step 302-2 is a rough screening performed by the server using the feature information to be matched of the image to be matched, and is intended to perform a preliminary filtering on a plurality of images in the audio file library to obtain an image that is relatively similar to the image to be matched, and forms an image set to be searched. Based on the image set to be searched, the server can search a target image corresponding to the image to be matched in the image set to be searched by adopting some image matching strategies.

It can be understood that, because the image set to be searched includes image data whose number of images is far lower than that of the image data included in the audio file library, the server can still quickly find out the target image when adopting some image matching strategies to find out the target image, thereby improving the finding efficiency.

Illustratively, the server may find the target image from the image set to be searched based on the ORB algorithm and the SIFT algorithm in the process of performing step 302-3.

For example, the server may further screen the image set to be searched by using an ORB (Oriented Binary Robust independent basic feature) algorithm, and screen a preset number of images from the image set to be searched to form an intermediate image set.

Then, the server may screen out a target image corresponding to the image to be matched from the intermediate image set by using a Scale-invariant feature transform (SIFT) algorithm.

Similar to the ORB algorithm, in the process of executing the SIFT algorithm, the server calculates the similarity based on a K neighbor matching method, that is: and when matching is carried out, K images to be matched with the suspension beam are most similar to each image feature point in the image set to be searched or the intermediate image set, the most similar feature points are selected as matching points, and all images in the image set to be searched or the intermediate image set are sequenced, so that the required images are determined. Such as determining an intermediate image set from the image set to be searched or determining a target image from the intermediate image set.

In addition, in some possible embodiments, in order to ensure the accuracy of the search result, when the target image is determined, the similarity between each image in the intermediate image set and the image to be matched may be further limited by using a set third threshold, and when an image in the intermediate image set with the highest similarity to the image to be matched has a similarity value greater than the third threshold with the image to be matched, the server may determine the image as the target image; on the contrary, when the image with the highest similarity to the image to be matched in the intermediate image set has a similarity value smaller than or equal to the third threshold value with the image to be matched, the server may discard the image, and determine that the target image corresponding to the image to be matched is not found in the plurality of images included in the audio file library, and the server may send preset matching failure information, for example, to the terminal device, thereby prompting the user of the matching failure.

In some possible scenes, based on the requirement of updating the audio file library, the images and audio files stored in the audio file library can be updated in a mode of transmitting the images to be put in storage and the corresponding audio files to be put in storage to the server.

The server can compare the image to be stored with all images recorded in the audio file library aiming at the received image to be stored and the corresponding audio file to be stored, and if the image to be stored is different from all images in the audio file library, the server can store the image to be stored and the corresponding audio file to be stored in the audio file library; otherwise, if the image to be put in storage is the same as at least one image recorded in the audio file library, the server may discard the image to be put in storage and the corresponding audio file to be put in storage.

In addition, based on the same inventive concept as the above-mentioned data processing method applied to the server side provided by the present application, the present application further provides a first data processing apparatus 400 as shown in fig. 6, where the first data processing apparatus 400 may be applied to a server as in fig. 1, and the first data processing apparatus 400 may include a first transceiver module 401 and a first processing module 402.

A first transceiver module 401, configured to receive an image to be matched, sent by a terminal device;

a first processing module 402, configured to search a target image corresponding to an image to be matched in an audio file library;

the first transceiving module 401 is further configured to send out a target audio file corresponding to the target image.

Optionally, in some possible embodiments, when the first processing module 402 searches for a target image corresponding to an image to be matched in the audio file library, the first processing module is specifically configured to:

extracting characteristic information to be matched corresponding to the image to be matched;

comparing the characteristic information to be matched with the characteristic information corresponding to each image in the audio file library, and screening out an image set to be searched corresponding to the image to be matched; each image in the image set to be searched is derived from an audio file library;

and searching a target image corresponding to the image to be matched in the image set to be searched.

Optionally, in some possible embodiments, when the feature information to be matched is compared with the feature information corresponding to each image in the audio file library, and an image set to be searched corresponding to the image to be matched is screened out, the first processing module 402 is specifically configured to:

calculating the feature similarity of the image to be matched and each image in the audio file library according to the feature information corresponding to the feature information of each image in the audio file library;

and screening an image set to be searched corresponding to the image to be matched from all the images with the corresponding characteristic similarity reaching a first threshold value.

Optionally, in some possible embodiments, when the first processing module 402 selects, from all images whose corresponding feature similarity reaches the first threshold, an image set to be searched corresponding to an image to be matched, the first processing module is specifically configured to:

if the number of the images of which the corresponding feature similarity reaches the first threshold is larger than a second threshold, selecting the images of which the number is the second threshold from high to low according to the feature similarity to form an image set to be searched corresponding to the image to be matched;

and if the number of the images of which the corresponding feature similarity reaches the first threshold is less than or equal to the second threshold, forming an image set to be searched corresponding to the image to be matched by the images of which the corresponding feature similarity reaches the first threshold.

Optionally, in some possible embodiments, when the first processing module 402 finds a target image corresponding to an image to be matched in the image set to be searched, it is specifically configured to:

screening a preset number of images from an image set to be searched by utilizing an ORB algorithm to form an intermediate image set;

and screening out a target image corresponding to the image to be matched from the intermediate image set by utilizing an SIFT algorithm.

Optionally, in some possible embodiments, the first transceiver module 401 is further configured to receive an image to be binned and a corresponding audio file to be binned;

the first processing module 402 is further configured to, if the image to be put into storage is different from all the images recorded in the audio file library, store the image to be put into storage and the audio file to be put into storage into the audio file library.

Optionally, in some possible embodiments, the first transceiver module 401 is further configured to send out preset matching failure information if a target image corresponding to an image to be matched is not found out in the plurality of images.

Also, based on the same inventive concept as the above-mentioned data processing method applied to the terminal device provided in the present application, the present application further provides a second data processing apparatus 500 as shown in fig. 7, where the second data processing apparatus 500 can be applied to the terminal device as shown in fig. 1, and the second data processing apparatus 500 can include a second processing module 501 and a second transceiver module 502.

A second processing module 501, configured to obtain an image to be matched;

a second transceiver module 502, configured to send an audio acquisition request to a server; the audio acquisition request comprises an image to be matched, and is used for indicating the server to feed back a target audio file corresponding to the image to be matched;

the second transceiver module 502 is further configured to receive a target audio file sent by the server.

Optionally, in some possible embodiments, when acquiring the image to be matched, the second processing module 501 is specifically configured to:

detecting whether a closed rectangular frame exists in a target video frame aiming at the target video frame in the acquired video stream;

and if the closed rectangular frame exists in the target video frame, determining the target video frame as an image to be matched.

Optionally, in some possible embodiments, before detecting whether the closed rectangular box exists in the target video frame, the second processing module 501 is further configured to:

carrying out image enhancement operation on a target video frame;

wherein the image enhancement operation comprises at least one of:

gaussian blur, noise point removal, canny edge detection operator, edge detection, expansion operation, edge enhancement and binarization processing.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to some embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).

It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in some embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to some embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.

The above description is only a few examples of the present application and is not intended to limit the present application, and those skilled in the art will appreciate that various modifications and variations can be made in the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. A data processing method is characterized in that the method is applied to a server, wherein the server records an audio file library, and the audio file library comprises a plurality of images and audio files corresponding to the images; the server establishes communication with a terminal device;

the method comprises the following steps:

receiving an image to be matched sent by the terminal equipment;

and sending out the target audio file corresponding to the target image.

2. The method of claim 1, wherein the searching for the target image corresponding to the image to be matched in the audio file library comprises:

extracting the characteristic information to be matched corresponding to the image to be matched;

comparing the characteristic information to be matched with the characteristic information corresponding to each image in the audio file library, and screening out an image set to be searched corresponding to the image to be matched; wherein each image included in the image set to be searched is derived from the audio file library;

3. The method of claim 2, wherein comparing the feature information to be matched with the feature information corresponding to each image in the audio file library to screen out the image set to be searched corresponding to the image to be matched comprises:

calculating the feature similarity corresponding to the image to be matched and each image in the audio file library according to the feature information corresponding to the image to be matched and each image in the audio file library;

and screening out an image set to be searched corresponding to the image to be matched from all the images with the corresponding characteristic similarity reaching a first threshold value.

4. The method of claim 3, wherein the step of screening out the image set to be searched corresponding to the image to be matched from all the images with the corresponding feature similarity reaching the first threshold comprises:

5. The method of claim 2, wherein the finding of the target image corresponding to the image to be matched in the image set to be searched comprises:

screening out a preset number of images from the image set to be searched by utilizing an ORB algorithm to form an intermediate image set;

6. The method of claim 1, wherein the method further comprises:

receiving an image to be put in storage and a corresponding audio file to be put in storage;

and if the image to be warehoused is different from all the images recorded in the audio file library, storing the image to be warehoused and the audio file to be warehoused into the audio file library.

7. The method of claim 1, wherein the method further comprises:

and if the target image corresponding to the image to be matched is not found out in the plurality of images, sending out preset matching failure information.

8. A data processing method is characterized in that the method is applied to terminal equipment, and the terminal equipment establishes communication with a server; the method comprises the following steps:

acquiring an image to be matched;

and receiving the target audio file sent by the server.

9. The method of claim 8, wherein the obtaining the image to be matched comprises:

detecting whether a closed rectangular frame exists in a target video frame in an acquired video stream;

10. The method of claim 9, wherein prior to said detecting whether a closed rectangular box is present in the target video frame, the method further comprises:

performing image enhancement operation on the target video frame;

wherein the image enhancement operation comprises at least one of:

11. A data processing device is applied to a server, wherein the server records an audio file library, and the audio file library comprises a plurality of images and audio files corresponding to the images; the server establishes communication with a terminal device;

the device comprises:

12. A data processing device is applied to a terminal device, and the terminal device establishes communication with a server; the device comprises:

the second processing module is used for acquiring an image to be matched;

13. An electronic device, comprising:

a memory for storing one or more programs;

a processor;

the one or more programs, when executed by the processor, implement the method of any of claims 1-10.

14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-10.