CN108255985A - Data directory construction method, search method and device, medium and electronic equipment - Google Patents
Data directory construction method, search method and device, medium and electronic equipment Download PDFInfo
- Publication number
- CN108255985A CN108255985A CN201711461946.8A CN201711461946A CN108255985A CN 108255985 A CN108255985 A CN 108255985A CN 201711461946 A CN201711461946 A CN 201711461946A CN 108255985 A CN108255985 A CN 108255985A
- Authority
- CN
- China
- Prior art keywords
- data
- word
- primitive character
- target data
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This disclosure relates to a kind of data directory construction method, search method and device, medium and electronic equipment, the method includes:Obtain the target data of index to be built;Determine the primitive character word of the target data;Related term extension is carried out to the primitive character word, obtains extension feature word;The primitive character word and the extension feature word are input to knowledge mapping and made inferences, obtains inferencing aspects word;The index of the target data is built according at least to the primitive character word and the inferencing aspects word.On the one hand, it is ensured that the comprehensive and accuracy of the index of the target data is improved the hit rate of the target data, effectively improves the accuracy for the result retrieved based on the index.On the other hand, by disclosed method, the index file of unified form can be established to different types of target data, to realize that the integrated retrieval based on different types of data provides technical support.
Description
Technical field
This disclosure relates to information retrieval field, and in particular, to a kind of data directory construction method, search method and dress
It puts, medium and electronic equipment.
Background technology
Information retrieval, which refers to from the information comprising abundant content, is focused to find out required or interested information or knowledge
Process, the main task of information retrieval includes to the expression of item of information, storage, tissue and access.
In the prior art, retrieval technique is mainly text-oriented, e.g., Google, Yahoo and Baidu search engine etc..
With the development of retrieval technique, for single medium data, such as image data, audio data, can it be established based on its content
Corresponding index.For example, the text information in image data can be translated by character recognition technologies, recognition of face can be passed through
Data determine the personage in image data, so as to obtain people information, later, can add the text information and people information
Into the index of the image data.
However, when being retrieved to the data in database, due to the use habit of user.Technical field, knowledge water
Flat difference is also different from the search condition of same target.In this case, the index established based on aforesaid way
When being retrieved, to the more demanding of search condition input by user, and the limitation retrieved is larger, accuracy is relatively low.
Invention content
The purpose of the disclosure be to provide it is a kind of can be with multiple types general data directory construction method, search method and dress
It puts, medium and electronic equipment.
To achieve these goals, according to the disclosure in a first aspect, providing a kind of data directory construction method, the side
Method includes:
Obtain the target data of index to be built;
Determine the primitive character word of the target data;
Related term extension is carried out to the primitive character word, obtains extension feature word;
The primitive character word and the extension feature word are input to knowledge mapping and made inferences, obtains inferencing aspects
Word;
The index of the target data is built according at least to the primitive character word and the inferencing aspects word.
Optionally, the target data is any one of image data, video data, audio data, text data;
The primitive character word for determining the target data, including:
When the target data is image data, the feature letter of following at least one type of described image data is determined
Breath:Character features, features, color characteristic, affective characteristics, textural characteristics, shape feature, spatial position feature, and will obtain
The primitive character word of the characteristic information got as described image data;
When the target data is text data, following at least one processing is carried out to the text data:Information is taken out
It takes, name Entity recognition and sentiment analysis, the primitive character word of the acquired results as the text data;
When the target data is audio data, the audio data is converted into corresponding text data, and to institute
It states text data and carries out following at least one processing:Information extraction, name Entity recognition and sentiment analysis, acquired results are as institute
State the primitive character word of audio data;
When the target data is video data, for the image data included in the video data, determine described
The characteristic information of following at least one type of image data:Character features, features, color characteristic, affective characteristics, texture
Feature, shape feature, spatial position feature;For the audio data included in the video data, the audio data is turned
Corresponding text data is changed to, and following at least one processing is carried out to the text data:Information extraction, name Entity recognition
And sentiment analysis, acquired results and the characteristic information that gets are together as the primitive character of the video data
Word.
Optionally, the rope that the target data is built according at least to the primitive character word and the inferencing aspects word
Draw, including:
Create the index file for the target data, and by the primitive character word and its corresponding characteristic type,
And the index file is written in the inferencing aspects word and its corresponding characteristic type;
The index file is associated with the target data.
Optionally, the rope that the target data is built according at least to the primitive character word and the inferencing aspects word
Draw, further include:
The index file is written at least one of data type, data source, ID by the target data.
According to the second aspect of the disclosure, a kind of search method is provided, the method includes:
Search condition input by user is received, and determines that retrieval is intended to according to the search condition;
According to the index of each data in the retrieval intent query raw data base, obtain and meet the number that the retrieval is intended to
According to, wherein, the raw data base includes a plurality of types of data, and the index of each data is by described in first aspect
Data directory construction method is built in advance;
According to the data for meeting the retrieval and being intended to, retrieval result is obtained.
According to the third aspect of the disclosure, a kind of data directory construction device is provided, described device includes:
Acquisition module, for obtaining the target data of index to be built;
Determining module, for determining the primitive character word of the target data;
Expansion module for carrying out related term extension to the primitive character word, obtains extension feature word;
Reasoning module, for the primitive character word and the extension feature word to be input to knowledge mapping and make inferences,
Obtain inferencing aspects word;
Module is built, for building the target data according at least to the primitive character word and the inferencing aspects word
Index.
Optionally, the target data is any one of image data, video data, audio data, text data;
The determining module includes:
First determination sub-module, for when the target data is image data, determining the following of described image data
At least one of characteristic information of at least one type:Character features, features, color characteristic, affective characteristics, texture are special
Sign, shape feature, spatial position feature, and using the characteristic information got as the original spy of described image data
Levy word;
Second determination sub-module, for when the target data is text data, being carried out to the text data following
At least one processing:Information extraction, name Entity recognition and sentiment analysis, the original of the acquired results as the text data
Beginning Feature Words;
Third determination sub-module, for when the target data is audio data, the audio data to be converted to pair
The text data answered, and following at least one processing is carried out to the text data:Information extraction, name Entity recognition and emotion
Analysis, the primitive character word of the acquired results as the audio data;
4th determination sub-module, for when the target data is video data, for being included in the video data
Image data, determine at least one of characteristic information of following at least one type of described image data:Character features,
Features, color characteristic, affective characteristics, textural characteristics, shape feature, spatial position feature;For in the video data
Comprising audio data, the audio data is converted into corresponding text data, and the text data is carried out with down toward
A kind of few processing:Information extraction, name Entity recognition and sentiment analysis, acquired results and the characteristic information that gets are together
The primitive character word as the video data.
Optionally, the structure module includes:
First processing submodule, for creating the index file for the target data, and by the primitive character word
And its index file is written in corresponding characteristic type and the inferencing aspects word and its corresponding characteristic type;
Submodule is associated with, for the index file is associated with the target data.
Optionally, the structure module further includes:
Second processing submodule, at least one of data type, data source, ID of the target data to be write
Enter the index file.
According to the fourth aspect of the disclosure, a kind of retrieval device is provided, described device includes:
Receiving module for receiving search condition input by user, and determines that retrieval is intended to according to the search condition;
Enquiry module, for the index according to each data in the retrieval intent query raw data base, acquisition meets institute
The data that retrieval is intended to are stated, wherein, the raw data base includes a plurality of types of data, and the index of each data is to pass through
What the data directory construction device described in the third aspect was built in advance;
Processing module for meeting the data that the retrieval is intended to according to, obtains retrieval result.
According to the 5th of the disclosure the aspect, a kind of computer readable storage medium is provided, is stored thereon with computer program,
The step of first aspect or second aspect the method are realized when the program is executed by processor.
According to the 6th of the disclosure the aspect, a kind of electronic equipment is provided, including:
Computer readable storage medium described in 5th aspect;And
One or more processor, for performing the program in the computer readable storage medium.
In the above-mentioned technical solutions, by the primitive character word of target data and the expansion obtained to primitive character word extension
Benchmark of the Feature Words as knowledge mapping reasoning is opened up, the accuracy of knowledge mapping reasoning can be effectively improved, so that the mesh obtained
The inferencing aspects word for marking data is more comprehensive, target data can be described from multiple dimensions, make to retouch target data
It is more accurate to state.Therefore, the primitive character word based on target data and inferencing aspects word build the index of the target data, a side
Face, it is ensured that the comprehensive and accuracy of the index of the target data improves the hit rate of the target data, effectively improves base
In the accuracy for the result that the index is retrieved.It on the other hand, can be to different types of target by disclosed method
Data establish the index file of unified form, to realize that the integrated retrieval based on different types of data provides technical support.
Other feature and advantage of the disclosure will be described in detail in subsequent specific embodiment part.
Description of the drawings
Attached drawing is for providing further understanding of the disclosure, and a part for constitution instruction, with following tool
Body embodiment is used to explain the disclosure, but do not form the limitation to the disclosure together.In the accompanying drawings:
Fig. 1 is the flow chart of data directory construction method provided according to an embodiment of the present disclosure;
Fig. 2 is an example image;
Fig. 3 is that a kind of example for the index for building the target data according at least to primitive character word and inferencing aspects word is real
The flow chart of existing mode;
Fig. 4 is the flow chart of the search method provided according to an embodiment of the present disclosure;
Fig. 5 is the block diagram of data directory construction device provided according to an embodiment of the present disclosure;
Fig. 6 is the block diagram of retrieval device provided according to an embodiment of the present disclosure;
Fig. 7 is the block diagram according to a kind of electronic equipment shown in an exemplary embodiment;
Fig. 8 is the block diagram according to a kind of electronic equipment shown in an exemplary embodiment.
Specific embodiment
The specific embodiment of the disclosure is described in detail below in conjunction with attached drawing.It should be understood that this place is retouched
The specific embodiment stated is only used for describing and explaining the disclosure, is not limited to the disclosure.
Shown in Fig. 1, for the flow chart of data directory construction method provided according to an embodiment of the present disclosure, such as scheme
Described in 1, the method includes:
In S11, the target data of index to be built is obtained.
Optionally, the target data can be any in image data, video data, audio data, text data
Person.
In S12, the primitive character word of target data is determined.
When the target data is image data, the feature letter of following at least one type of described image data is determined
Breath:Character features, features, color characteristic, affective characteristics, textural characteristics, shape feature, spatial position feature, and will obtain
The primitive character word of the characteristic information got as described image data.
Illustratively, Fig. 2 is an example image.Image shown in Fig. 2 is inputted into deep learning convolutional neural networks model, to obtain
Obtain the characteristic information of the image.Wherein, which is obtained by training study in advance.It is logical
It crosses the neural network model image is identified, characteristic information " Einstein, black and white " can be obtained, by this feature information
" Einstein, black and white " is determined as the primitive character word of the image.
When the target data is text data, following at least one processing is carried out to the text data:Information is taken out
It takes, name Entity recognition and sentiment analysis, the primitive character word of the acquired results as the text data.
Illustratively, text data for " Dior is trendy, the autumn and winter trendy sweet crew neck of doll's money with Mao Lingmao overcoats housing,
Yardage:SM, P330 " obtain characteristic information " Dior, Mao Ling, P330 " after carrying out keyword abstraction to this article notebook data, can incite somebody to action
This feature information is determined as the primitive character word of this article notebook data.
When the target data is audio data, the audio data is converted into corresponding text data, and to institute
It states text data and carries out following at least one processing:Information extraction, name Entity recognition and sentiment analysis, acquired results are as institute
State the primitive character word of audio data.
Wherein, when target data is audio data, the audio data is converted into text data first, is obtained later former
The method of beginning Feature Words is identical with text data, and details are not described herein.
When the target data is video data, for the image data included in the video data, determine described
At least one of characteristic information of following at least one type of image data:Character features, features, color characteristic,
Affective characteristics, textural characteristics, shape feature, spatial position feature;For the audio data included in the video data, by institute
It states audio data and is converted to corresponding text data, and following at least one processing is carried out to the text data:Information extraction,
Name Entity recognition and sentiment analysis, acquired results and the characteristic information that gets are together as the institute of the video data
State primitive character word.
It wherein, can be by the image data in the video data and text data point when target data is video data
It is not handled, to obtain the image data and the corresponding characteristic information of text data in the video data, and by image
Data and the corresponding characteristic information of text data are determined as the primitive character word of video data.
Through the above technical solutions, it can be determined according to the type of target data using mode corresponding with the target data
Primitive character word so as to ensure the accuracy of the primitive character word of target data, provides accurate data for structure index and supports.
In S13, related term extension is carried out to primitive character word, obtains extension feature word.
Example image as shown in Figure 2, can be by advance after determining its primitive character word for " Einstein, black and white "
The related term model of acquisition is extended it.Illustratively, the extension obtained according to primitive character word " Einstein, black and white " is special
It is " the theory of relativity, Germany, color " to levy word.Wherein, the related term which determines can according to primitive character word
Correlation it is descending be ranked up, when the number of related term is excessive, then the phase can be set according to actual use situation
The number of the related term of word model output is closed, for example, ten related term is as extension feature word before selection ranking.
In S14, primitive character word and extension feature word are input to knowledge mapping and made inferences, obtain inferencing aspects
Word.
Knowledge mapping is substantially a kind of semantic network, which includes the side of node and connecting node.Wherein, node
Entity or concept are represented, while the various semantic relations between representing entity/concept.Therefore, it is primitive character word and extension is special
Sign word is input to after knowledge mapping, can obtain primitive character word and the associated word of extension feature word, and based on language
Adopted relationship makes inferences, to obtain inferencing aspects word.
Connect above-mentioned example, according to primitive character word " Einstein, black and white " obtain extension feature word " the theory of relativity, Germany,
After color ", primitive character word and extension feature word Input knowledge collection of illustrative plates are made inferences, to obtain inferencing aspects word.Illustratively,
The inferencing aspects word of acquisition is " scientist, physics, talent, scholar, German, the theory of relativity, gray scale, mass-energy equation ".
In S15, the index of the target data is built according at least to primitive character word and inferencing aspects word.
It is the one kind for the index that the target data is built according at least to primitive character word and inferencing aspects word shown in Fig. 3
The flow chart of sample implementation, as shown in figure 3, including:
In S31, create the index file for target data, and by primitive character word and its corresponding characteristic type,
And index file is written in inferencing aspects word and its corresponding characteristic type.Illustratively, which can be JSON
The file of (JavaScript Object Notation, JS object tag) form.
It wherein,, can be by the way that Entity recognition be named to obtain when processing acquisition primitive character word is carried out to it for text data
The primitive character word generic is obtained, and the entity class is determined as the corresponding characteristic type of primitive character word.For figure
As data, characteristic information can be labeled, in training deep learning convolutional neural networks model so as to pass through the model
The primitive character word of target data and its corresponding characteristic type can be obtained.For each original in audio data and video data
The method of determination of the corresponding characteristic type of Feature Words is similar to aforesaid way, and details are not described herein.
It, can be true by the type of the inferencing aspects word in knowledge mapping for the inferencing aspects word obtained by knowledge mapping
It is set to its corresponding characteristic type.It, can be according to reasoning spy if not marking the type of the inferencing aspects word in the knowledge mapping
The context relation of sign word carries out prediction mark to the corresponding characteristic type of inferencing aspects word, wherein, the prediction notation methods
For the prior art, details are not described herein.
Optionally, by the primitive character word and its corresponding characteristic type and the inferencing aspects word and its right
The characteristic type answered is written before the index file, primitive character word and inferencing aspects word can be carried out according to characteristic type
The hierarchical index of feature based type is established in classification.Illustratively, the primitive character word " Ai Yinsi of exemplary image data shown in Fig. 2
It is smooth " corresponding with inferencing aspects word " scientist, talent, scholar " characteristic type is all " character features ", then will " Einstein,
Scientist, talent, scholar " is written to index file as an index record, and the data that can effectively reduce index file are superfluous
It is remaining.
It is in S32, index file is associated with target data.
In the above-mentioned technical solutions, by by the primitive character word of target data and inferencing aspects word and its corresponding feature
Type is also written in index file, later by the index file and target data association, can improve and the target data is retouched
That states is comprehensive, so as to improve the precision of index.It is inputted in search condition, hit primitive character word and inferencing aspects word in user
Any feature word when, can inquire the target data, it is possibility to have effect improve based on the index carry out effectiveness of retrieval
And accuracy rate, promote user experience.
It optionally, can also will be described in the write-in of at least one of the data type, data source, ID of the target data
Index file.
Wherein, the data type of target data can include text type, image type, video type, audio types etc.
Deng.Data source is the store path of the target data, can be fast by data type, data source, ID of target data etc.
Speed positions the target data.Illustratively, the mode that distributed storage may be used of different types of data is stored, according to number
The storage region of the target data can be quickly determined according to type, improves the recall precision based on the index file.
Illustratively, the index file of example image shown in Fig. 2 is as follows:
Therefore, according to the prior art, the index established to example image shown in Fig. 2 includes keyword " Einstein ",
When querying condition is " proposing relativistic scientist ", then according to when index is inquired in the prior art, then will not inquire
To the image data.And when being inquired based on the index that disclosed method establishes the image data, then it can inquire this
Image data can effectively improve the comprehensive of index, to improve the accuracy inquired based on the index.
In conclusion by the primitive character word of target data and the extension feature word obtained to primitive character word extension
As the benchmark of knowledge mapping reasoning, the accuracy of knowledge mapping reasoning can be effectively improved, so that the target data obtained
Inferencing aspects word is more comprehensive, target data can be described from multiple dimensions, makes the description to target data more accurate
Really.Therefore, the primitive character word based on target data and inferencing aspects word build the index of the target data, on the one hand, can be with
Ensure the comprehensive and accuracy of the index of the target data, improve the hit rate of the target data, effectively improve based on the rope
Introduce the accuracy of the result of row retrieval.On the other hand, by disclosed method, different types of target data can be built
The index file of vertical unified form, to realize that the integrated retrieval based on different types of data provides technical support.
The disclosure also provides a kind of search method, as shown in figure 4, the method includes:
In S41, search condition input by user is received, and determine that retrieval is intended to according to the search condition.
Wherein, the type of search condition input by user can be text type, image type, audio types etc..Show
Example ground, can make inferences the search condition by knowledge mapping, to determine that retrieval is intended to.For example, when search condition is figure
As type data when, which is made inferences, if can not obtain the search condition retrieval meaning
Figure, can obtain the image data similar to the image data, and the similar image data is corresponding in knowledge mapping
Retrieval is intended to be determined as the corresponding retrieval intention of the search condition.Wherein, it determines to retrieve according to search condition in knowledge mapping
The mode of intention is the prior art, and details are not described herein.
In S42, according to the index of each data in retrieval intent query raw data base, obtain and meet the number that retrieval is intended to
According to, wherein, the raw data base includes a plurality of types of data, and the index of each data is by above-mentioned data directory structure
What construction method was built in advance.
Wherein, it after intention is retrieved in acquisition, can be made inferences to retrieval intention, with acquisition and retrieval intention pair
The term and search rule answered.The type of data can include image type, text type, audio class in raw data base
Type, video type etc..Due to data each in raw data base index for same form index, retrieved
When, integrated retrieval is carried out in a plurality of types of data that can be in raw data base, obtains various types in raw data base
Data under meet retrieval be intended to data, so as to fulfill across the integrated retrieval of media data.
In S43, according to the data for meeting retrieval intention, retrieval result is obtained.
Illustratively, can also will meet the data that retrieval is intended to extract, during so that retrieval result being fed back to user,
The source of the retrieval result can together be fed back to user, be bonded the more use demands of user, promote user experience.
In one embodiment, after the data for meeting retrieval intention are obtained, coder-decoder can be based on
(Encoder-Decoder) deep learning model handles it, can merge copy, retrieval and prediction in the model
Etc. a variety of word acquisition models.
In order to obtain retrieval result, can by search condition with meet data encoding that retrieval is intended into vector, in order to
Deep learning model uses.Illustratively, can using two-way RNN, (Recurrent Neural Networks recycle nerve net
Network) search condition is encoded, memory network can be used to be encoded to meeting the data that retrieval is intended to.
In another embodiment, all one's life can be increased after coder-decoder into confrontation network model, the generation pair
Anti- network model can include a generator and a discriminator, the spreading range of the retrieval result of generation be differentiated, with right
The spreading range of retrieval result control effectively, and it is excessive and the unwanted retrieval result of user occur to avoid the occurrence of spreading range
The phenomenon that, the accuracy of retrieval result is further improved, is bonded the use demand of user.
Wherein, the construction method of coder-decoder and generation confrontation network model is the prior art, no longer superfluous herein
It states.
It in the above-mentioned technical solutions, can be original when determining that retrieval is intended to according to search condition input by user
It is inquired in the index of a plurality of types of data in database, so as to realize the integrated retrieval across media data, extension
The range of retrieval improves precision and the accuracy of retrieval.Meanwhile the data being intended to meeting retrieval are handled, then can be obtained
The retrieval result obtained based on a plurality of types of data is obtained, further improves the quality of retrieval, promotes user experience.
The disclosure also provides a kind of data directory construction device, as shown in figure 5, described device 10 includes:
Acquisition module 101, for obtaining the target data of index to be built;
Determining module 102, for determining the primitive character word of the target data;
Expansion module 103 for carrying out related term extension to the primitive character word, obtains extension feature word;
Reasoning module 104, for the primitive character word and the extension feature word to be input to knowledge mapping and push away
Reason obtains inferencing aspects word;
Module 105 is built, for building the number of targets according at least to the primitive character word and the inferencing aspects word
According to index.
Optionally, the target data is any one of image data, video data, audio data, text data;
The determining module 102 includes:
First determination sub-module, for when the target data is image data, determining the following of described image data
At least one of characteristic information of at least one type:Character features, features, color characteristic, affective characteristics, texture are special
Sign, shape feature, spatial position feature, and using the characteristic information got as the original spy of described image data
Levy word;
Second determination sub-module, for when the target data is text data, being carried out to the text data following
At least one processing:Information extraction, name Entity recognition and sentiment analysis, the original of the acquired results as the text data
Beginning Feature Words;
Third determination sub-module, for when the target data is audio data, the audio data to be converted to pair
The text data answered, and following at least one processing is carried out to the text data:Information extraction, name Entity recognition and emotion
Analysis, the primitive character word of the acquired results as the audio data;
4th determination sub-module, for when the target data is video data, for being included in the video data
Image data, determine at least one of characteristic information of following at least one type of described image data:Character features,
Features, color characteristic, affective characteristics, textural characteristics, shape feature, spatial position feature;For in the video data
Comprising audio data, the audio data is converted into corresponding text data, and the text data is carried out with down toward
A kind of few processing:Information extraction, name Entity recognition and sentiment analysis, acquired results and the characteristic information that gets are together
The primitive character word as the video data.
Optionally, the structure module 105 includes:
First processing submodule, for creating the index file for the target data, and by the primitive character word
And its index file is written in corresponding characteristic type and the inferencing aspects word and its corresponding characteristic type;
Submodule is associated with, for the index file is associated with the target data.
Optionally, the structure module 105 further includes:
Second processing submodule, at least one of data type, data source, ID of the target data to be write
Enter the index file.
The disclosure also provides a kind of retrieval device, as shown in fig. 6, described device 20 includes:
Receiving module 201 for receiving search condition input by user, and determines retrieval meaning according to the search condition
Figure;
Enquiry module 202 for the index according to each data in the retrieval intent query raw data base, is met
The data that the retrieval is intended to, wherein, the raw data base includes a plurality of types of data, and the index of each data is logical
Cross what above-mentioned data directory construction device was built in advance;
Processing module 203 for meeting the data that the retrieval is intended to according to, obtains retrieval result.
Fig. 7 is the block diagram according to a kind of electronic equipment 700 shown in an exemplary embodiment.As shown in fig. 7, the electronics is set
Standby 700 can include:Processor 701, memory 702, multimedia component 703, input/output (I/O) interface 704, Yi Jitong
Believe component 705.
Wherein, processor 701 is used to control the integrated operation of the electronic equipment 700, to complete above-mentioned data directory structure
All or part of step in construction method or search method.Memory 702 is used to store various types of data to support at this
The operation of electronic equipment 700, these data can for example include any application program for being operated on the electronic equipment 700
Or the instruction of method and the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video
Etc..The memory 702 can be by any kind of volatibility or non-volatile memory device or combination thereof realization, example
Such as static RAM (Static Random Access Memory, abbreviation SRAM), electrically erasable is read-only
Memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), it is erasable
Programmable read only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM) may be programmed read-only
Memory (Programmable Read-Only Memory, abbreviation PROM), read-only memory (Read-Only Memory, letter
Claim ROM), magnetic memory, flash memory, disk or CD.Multimedia component 703 can include screen and audio component.Its
Middle screen for example can be touch screen, and audio component is for output and/or input audio signal.For example, audio component can wrap
A microphone is included, microphone is used to receive external audio signal.The received audio signal can be further stored in and deposit
Reservoir 702 is sent by communication component 705.Audio component further includes at least one loud speaker, for exports audio signal.I/
O Interface 704 provides interface between processor 701 and other interface modules, other above-mentioned interface modules can be keyboard, mouse
Mark, button etc..These buttons can be virtual push button or entity button.Communication component 705 is for the electronic equipment 700 and its
Wired or wireless communication is carried out between his equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field
Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of combination, therefore corresponding communication
Component 705 can include:Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, electronic equipment 700 can be by one or more application application-specific integrated circuit
(Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital
Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device,
Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array
(Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member
Part is realized, for performing above-mentioned data directory construction method or search method.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction, example are additionally provided
Such as include the memory 702 of program instruction, above procedure instruction can be performed by the processor 701 of electronic equipment 700 in completion
The data directory construction method or search method stated.
Fig. 8 is the block diagram according to a kind of electronic equipment 800 shown in an exemplary embodiment.For example, electronic equipment 800 can
To be provided as a server.With reference to Fig. 8, electronic equipment 800 includes processor 822, and quantity can be one or more, with
And memory 832, for storing the computer program that can be performed by processor 822.The computer program stored in memory 832
Can include it is one or more each correspond to the module of one group of instruction.In addition, processor 822 can be configured as
The computer program is performed, to perform above-mentioned data directory construction method or search method.
In addition, electronic equipment 800 can also include power supply module 826 and communication component 850, which can be with
It is configured as performing the power management of electronic equipment 800, which, which can be configured as, realizes electronic equipment 800
Communication, for example, wired or wireless communication.In addition, the electronic equipment 800 can also include input/output (I/O) interface 858.Electricity
Sub- equipment 800 can be operated based on the operating system for being stored in memory 832, such as Windows ServerTM, Mac OS
XTM, UnixTM, LinuxTM etc..
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction, example are additionally provided
Such as include the memory 832 of program instruction, above procedure instruction can be performed by the processor 822 of electronic equipment 800 in completion
The data directory construction method or search method stated.
The preferred embodiment of the disclosure is described in detail above in association with attached drawing, still, the disclosure is not limited to above-mentioned reality
The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure
Monotropic type, these simple variants belong to the protection domain of the disclosure.It is further to note that in above-mentioned specific embodiment
Described in each particular technique feature, in the case of no contradiction, can be combined by any suitable means.For
Unnecessary repetition is avoided, the disclosure no longer separately illustrates various combinations of possible ways.
In addition, arbitrary combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally
Disclosed thought should equally be considered as disclosure disclosure of that.
Claims (10)
1. a kind of data directory construction method, which is characterized in that the method includes:
Obtain the target data of index to be built;
Determine the primitive character word of the target data;
Related term extension is carried out to the primitive character word, obtains extension feature word;
The primitive character word and the extension feature word are input to knowledge mapping and made inferences, obtains inferencing aspects word;
The index of the target data is built according at least to the primitive character word and the inferencing aspects word.
2. according to the method described in claim 1, it is characterized in that, the target data is image data, video data, audio
Any one of data, text data;The primitive character word for determining the target data, including:
When the target data is image data, the characteristic information of following at least one type of described image data is determined:
Character features, features, color characteristic, affective characteristics, textural characteristics, shape feature, spatial position feature, and will get
The primitive character word of the characteristic information as described image data;
When the target data is text data, following at least one processing is carried out to the text data:Information extraction, life
Name Entity recognition and sentiment analysis, the primitive character word of the acquired results as the text data;
When the target data is audio data, the audio data is converted into corresponding text data, and to the text
Notebook data carries out following at least one processing:Information extraction, name Entity recognition and sentiment analysis, acquired results are as the sound
The primitive character word of frequency evidence;
When the target data is video data, for the image data included in the video data, described image is determined
The characteristic information of following at least one type of data:Character features, features, color characteristic, affective characteristics, texture are special
Sign, shape feature, spatial position feature;For the audio data included in the video data, the audio data is converted
For corresponding text data, and following at least one processing is carried out to the text data:Information extraction, name Entity recognition and
Sentiment analysis, acquired results and the characteristic information that gets are together as the primitive character word of the video data.
It is 3. according to the method described in claim 1, it is characterized in that, described according at least to the primitive character word and the reasoning
Feature Words build the index of the target data, including:
Create the index file for the target data, and by the primitive character word and its corresponding characteristic type and
The index file is written in the inferencing aspects word and its corresponding characteristic type;
The index file is associated with the target data.
It is 4. according to the method described in claim 3, it is characterized in that, described according at least to the primitive character word and the reasoning
Feature Words build the index of the target data, further include:
The index file is written at least one of data type, data source, ID by the target data.
5. a kind of search method, which is characterized in that the method includes:
Search condition input by user is received, and determines that retrieval is intended to according to the search condition;
According to the index of each data in the retrieval intent query raw data base, obtain and meet the data that the retrieval is intended to,
Wherein, the raw data base includes a plurality of types of data, and the index of each data is by any in claim 1-4
What the data directory construction method described in was built in advance;
According to the data for meeting the retrieval and being intended to, retrieval result is obtained.
6. a kind of data directory construction device, which is characterized in that described device includes:
Acquisition module, for obtaining the target data of index to be built;
Determining module, for determining the primitive character word of the target data;
Expansion module for carrying out related term extension to the primitive character word, obtains extension feature word;
Reasoning module for the primitive character word and the extension feature word to be input to knowledge mapping and make inferences, obtains
Inferencing aspects word;
Module is built, for building the rope of the target data according at least to the primitive character word and the inferencing aspects word
Draw.
7. device according to claim 6, which is characterized in that the target data is image data, video data, audio
Any one of data, text data;
The determining module includes:
First determination sub-module, for when the target data be image data when, determine described image data it is following at least
At least one of a type of characteristic information:Character features, features, color characteristic, affective characteristics, textural characteristics,
Shape feature, spatial position feature, and using the characteristic information got as the primitive character of described image data
Word;
Second determination sub-module, for when the target data is text data, being carried out below at least to the text data
A kind of processing:Information extraction, name Entity recognition and sentiment analysis, the original spy of the acquired results as the text data
Levy word;
Third determination sub-module, for when the target data is audio data, the audio data being converted to corresponding
Text data, and following at least one processing is carried out to the text data:Information extraction, name Entity recognition and emotion point
Analysis, the primitive character word of the acquired results as the audio data;
4th determination sub-module, for when the target data be video data when, for the figure included in the video data
As data, at least one of characteristic information of following at least one type of described image data is determined:Character features, things
Feature, color characteristic, affective characteristics, textural characteristics, shape feature, spatial position feature;For being included in the video data
Audio data, the audio data is converted into corresponding text data, and following at least one is carried out to the text data
Kind processing:Information extraction, name Entity recognition and sentiment analysis, acquired results and the characteristic information conduct together got
The primitive character word of the video data.
8. a kind of retrieval device, which is characterized in that described device includes:
Receiving module for receiving search condition input by user, and determines that retrieval is intended to according to the search condition;
Enquiry module, for the index according to each data in the retrieval intent query raw data base, acquisition meets the inspection
The data of Suo Yitu, wherein, the raw data base includes a plurality of types of data, and the index of each data is to pass through right
It is required that the data directory construction device described in 6 or 7 was built in advance;
Processing module for meeting the data that the retrieval is intended to according to, obtains retrieval result.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor
The step of any one of claim 1-5 the methods are realized during row.
10. a kind of electronic equipment, which is characterized in that including:
Computer readable storage medium described in claim 9;And
One or more processor, for performing the program in the computer readable storage medium.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711461946.8A CN108255985A (en) | 2017-12-28 | 2017-12-28 | Data directory construction method, search method and device, medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711461946.8A CN108255985A (en) | 2017-12-28 | 2017-12-28 | Data directory construction method, search method and device, medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108255985A true CN108255985A (en) | 2018-07-06 |
Family
ID=62724384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711461946.8A Pending CN108255985A (en) | 2017-12-28 | 2017-12-28 | Data directory construction method, search method and device, medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108255985A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110032734A (en) * | 2019-03-18 | 2019-07-19 | 百度在线网络技术(北京)有限公司 | Near synonym extension and generation confrontation network model training method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103064969A (en) * | 2012-12-31 | 2013-04-24 | 武汉传神信息技术有限公司 | Method for automatically creating keyword index table |
CN103886034A (en) * | 2014-03-05 | 2014-06-25 | 北京百度网讯科技有限公司 | Method and equipment for building indexes and matching inquiry input information of user |
US20170124217A1 (en) * | 2015-10-30 | 2017-05-04 | International Business Machines Corporation | System, method, and recording medium for knowledge graph augmentation through schema extension |
-
2017
- 2017-12-28 CN CN201711461946.8A patent/CN108255985A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103064969A (en) * | 2012-12-31 | 2013-04-24 | 武汉传神信息技术有限公司 | Method for automatically creating keyword index table |
CN103886034A (en) * | 2014-03-05 | 2014-06-25 | 北京百度网讯科技有限公司 | Method and equipment for building indexes and matching inquiry input information of user |
US20170124217A1 (en) * | 2015-10-30 | 2017-05-04 | International Business Machines Corporation | System, method, and recording medium for knowledge graph augmentation through schema extension |
Non-Patent Citations (2)
Title |
---|
**: "《cpc分类表》", 31 December 2015 * |
郝林雪等: "摘要", 《计算机科学与探索》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110032734A (en) * | 2019-03-18 | 2019-07-19 | 百度在线网络技术(北京)有限公司 | Near synonym extension and generation confrontation network model training method and device |
CN110032734B (en) * | 2019-03-18 | 2023-02-28 | 百度在线网络技术(北京)有限公司 | Training method and device for similar meaning word expansion and generation of confrontation network model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Teney et al. | Learning what makes a difference from counterfactual examples and gradient supervision | |
US10949744B2 (en) | Recurrent neural network architectures which provide text describing images | |
CN111897964B (en) | Text classification model training method, device, equipment and storage medium | |
US11550871B1 (en) | Processing structured documents using convolutional neural networks | |
CN110796190A (en) | Exponential modeling with deep learning features | |
CN109934249A (en) | Data processing method, device, medium and calculating equipment | |
CN106973244A (en) | Using it is Weakly supervised for image match somebody with somebody captions | |
US20220284327A1 (en) | Resource pushing method and apparatus, device, and storage medium | |
Zhang et al. | Discovering place-informative scenes and objects using social media photos | |
Zhang et al. | Leveraging unlabeled data for emotion recognition with enhanced collaborative semi-supervised learning | |
CN108038103A (en) | A kind of method, apparatus segmented to text sequence and electronic equipment | |
CN107066464A (en) | Semantic Natural Language Vector Space | |
CN110019471A (en) | Text is generated from structural data | |
CN107423398A (en) | Exchange method, device, storage medium and computer equipment | |
CN109918684A (en) | Model training method, interpretation method, relevant apparatus, equipment and storage medium | |
CN108304373A (en) | Construction method, device, storage medium and the electronic device of semantic dictionary | |
CN109635080A (en) | Acknowledgment strategy generation method and device | |
CN109656541A (en) | Exploitative recommended method, device, storage medium and electronic equipment | |
CN111368525A (en) | Information searching method, device, equipment and storage medium | |
CN108984555A (en) | User Status is excavated and information recommendation method, device and equipment | |
CN108768824A (en) | Information processing method and device | |
US9129216B1 (en) | System, method and apparatus for computer aided association of relevant images with text | |
KR20190118108A (en) | Electronic apparatus and controlling method thereof | |
CN109753275A (en) | Recommended method, device, storage medium and the electronic equipment of Application Programming Interface | |
Pande et al. | Development and deployment of a generative model-based framework for text to photorealistic image generation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180706 |