CN108255985A - Data directory construction method, search method and device, medium and electronic equipment - Google Patents

Data directory construction method, search method and device, medium and electronic equipment Download PDF

Info

Publication number
CN108255985A
CN108255985A CN201711461946.8A CN201711461946A CN108255985A CN 108255985 A CN108255985 A CN 108255985A CN 201711461946 A CN201711461946 A CN 201711461946A CN 108255985 A CN108255985 A CN 108255985A
Authority
CN
China
Prior art keywords
data
word
primitive character
target data
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711461946.8A
Other languages
Chinese (zh)
Inventor
蔡巍
崔朝辉
赵立军
张霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201711461946.8A priority Critical patent/CN108255985A/en
Publication of CN108255985A publication Critical patent/CN108255985A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This disclosure relates to a kind of data directory construction method, search method and device, medium and electronic equipment, the method includes:Obtain the target data of index to be built;Determine the primitive character word of the target data;Related term extension is carried out to the primitive character word, obtains extension feature word;The primitive character word and the extension feature word are input to knowledge mapping and made inferences, obtains inferencing aspects word;The index of the target data is built according at least to the primitive character word and the inferencing aspects word.On the one hand, it is ensured that the comprehensive and accuracy of the index of the target data is improved the hit rate of the target data, effectively improves the accuracy for the result retrieved based on the index.On the other hand, by disclosed method, the index file of unified form can be established to different types of target data, to realize that the integrated retrieval based on different types of data provides technical support.

Description

Data directory construction method, search method and device, medium and electronic equipment
Technical field
This disclosure relates to information retrieval field, and in particular, to a kind of data directory construction method, search method and dress It puts, medium and electronic equipment.
Background technology
Information retrieval, which refers to from the information comprising abundant content, is focused to find out required or interested information or knowledge Process, the main task of information retrieval includes to the expression of item of information, storage, tissue and access.
In the prior art, retrieval technique is mainly text-oriented, e.g., Google, Yahoo and Baidu search engine etc.. With the development of retrieval technique, for single medium data, such as image data, audio data, can it be established based on its content Corresponding index.For example, the text information in image data can be translated by character recognition technologies, recognition of face can be passed through Data determine the personage in image data, so as to obtain people information, later, can add the text information and people information Into the index of the image data.
However, when being retrieved to the data in database, due to the use habit of user.Technical field, knowledge water Flat difference is also different from the search condition of same target.In this case, the index established based on aforesaid way When being retrieved, to the more demanding of search condition input by user, and the limitation retrieved is larger, accuracy is relatively low.
Invention content
The purpose of the disclosure be to provide it is a kind of can be with multiple types general data directory construction method, search method and dress It puts, medium and electronic equipment.
To achieve these goals, according to the disclosure in a first aspect, providing a kind of data directory construction method, the side Method includes:
Obtain the target data of index to be built;
Determine the primitive character word of the target data;
Related term extension is carried out to the primitive character word, obtains extension feature word;
The primitive character word and the extension feature word are input to knowledge mapping and made inferences, obtains inferencing aspects Word;
The index of the target data is built according at least to the primitive character word and the inferencing aspects word.
Optionally, the target data is any one of image data, video data, audio data, text data;
The primitive character word for determining the target data, including:
When the target data is image data, the feature letter of following at least one type of described image data is determined Breath:Character features, features, color characteristic, affective characteristics, textural characteristics, shape feature, spatial position feature, and will obtain The primitive character word of the characteristic information got as described image data;
When the target data is text data, following at least one processing is carried out to the text data:Information is taken out It takes, name Entity recognition and sentiment analysis, the primitive character word of the acquired results as the text data;
When the target data is audio data, the audio data is converted into corresponding text data, and to institute It states text data and carries out following at least one processing:Information extraction, name Entity recognition and sentiment analysis, acquired results are as institute State the primitive character word of audio data;
When the target data is video data, for the image data included in the video data, determine described The characteristic information of following at least one type of image data:Character features, features, color characteristic, affective characteristics, texture Feature, shape feature, spatial position feature;For the audio data included in the video data, the audio data is turned Corresponding text data is changed to, and following at least one processing is carried out to the text data:Information extraction, name Entity recognition And sentiment analysis, acquired results and the characteristic information that gets are together as the primitive character of the video data Word.
Optionally, the rope that the target data is built according at least to the primitive character word and the inferencing aspects word Draw, including:
Create the index file for the target data, and by the primitive character word and its corresponding characteristic type, And the index file is written in the inferencing aspects word and its corresponding characteristic type;
The index file is associated with the target data.
Optionally, the rope that the target data is built according at least to the primitive character word and the inferencing aspects word Draw, further include:
The index file is written at least one of data type, data source, ID by the target data.
According to the second aspect of the disclosure, a kind of search method is provided, the method includes:
Search condition input by user is received, and determines that retrieval is intended to according to the search condition;
According to the index of each data in the retrieval intent query raw data base, obtain and meet the number that the retrieval is intended to According to, wherein, the raw data base includes a plurality of types of data, and the index of each data is by described in first aspect Data directory construction method is built in advance;
According to the data for meeting the retrieval and being intended to, retrieval result is obtained.
According to the third aspect of the disclosure, a kind of data directory construction device is provided, described device includes:
Acquisition module, for obtaining the target data of index to be built;
Determining module, for determining the primitive character word of the target data;
Expansion module for carrying out related term extension to the primitive character word, obtains extension feature word;
Reasoning module, for the primitive character word and the extension feature word to be input to knowledge mapping and make inferences, Obtain inferencing aspects word;
Module is built, for building the target data according at least to the primitive character word and the inferencing aspects word Index.
Optionally, the target data is any one of image data, video data, audio data, text data;
The determining module includes:
First determination sub-module, for when the target data is image data, determining the following of described image data At least one of characteristic information of at least one type:Character features, features, color characteristic, affective characteristics, texture are special Sign, shape feature, spatial position feature, and using the characteristic information got as the original spy of described image data Levy word;
Second determination sub-module, for when the target data is text data, being carried out to the text data following At least one processing:Information extraction, name Entity recognition and sentiment analysis, the original of the acquired results as the text data Beginning Feature Words;
Third determination sub-module, for when the target data is audio data, the audio data to be converted to pair The text data answered, and following at least one processing is carried out to the text data:Information extraction, name Entity recognition and emotion Analysis, the primitive character word of the acquired results as the audio data;
4th determination sub-module, for when the target data is video data, for being included in the video data Image data, determine at least one of characteristic information of following at least one type of described image data:Character features, Features, color characteristic, affective characteristics, textural characteristics, shape feature, spatial position feature;For in the video data Comprising audio data, the audio data is converted into corresponding text data, and the text data is carried out with down toward A kind of few processing:Information extraction, name Entity recognition and sentiment analysis, acquired results and the characteristic information that gets are together The primitive character word as the video data.
Optionally, the structure module includes:
First processing submodule, for creating the index file for the target data, and by the primitive character word And its index file is written in corresponding characteristic type and the inferencing aspects word and its corresponding characteristic type;
Submodule is associated with, for the index file is associated with the target data.
Optionally, the structure module further includes:
Second processing submodule, at least one of data type, data source, ID of the target data to be write Enter the index file.
According to the fourth aspect of the disclosure, a kind of retrieval device is provided, described device includes:
Receiving module for receiving search condition input by user, and determines that retrieval is intended to according to the search condition;
Enquiry module, for the index according to each data in the retrieval intent query raw data base, acquisition meets institute The data that retrieval is intended to are stated, wherein, the raw data base includes a plurality of types of data, and the index of each data is to pass through What the data directory construction device described in the third aspect was built in advance;
Processing module for meeting the data that the retrieval is intended to according to, obtains retrieval result.
According to the 5th of the disclosure the aspect, a kind of computer readable storage medium is provided, is stored thereon with computer program, The step of first aspect or second aspect the method are realized when the program is executed by processor.
According to the 6th of the disclosure the aspect, a kind of electronic equipment is provided, including:
Computer readable storage medium described in 5th aspect;And
One or more processor, for performing the program in the computer readable storage medium.
In the above-mentioned technical solutions, by the primitive character word of target data and the expansion obtained to primitive character word extension Benchmark of the Feature Words as knowledge mapping reasoning is opened up, the accuracy of knowledge mapping reasoning can be effectively improved, so that the mesh obtained The inferencing aspects word for marking data is more comprehensive, target data can be described from multiple dimensions, make to retouch target data It is more accurate to state.Therefore, the primitive character word based on target data and inferencing aspects word build the index of the target data, a side Face, it is ensured that the comprehensive and accuracy of the index of the target data improves the hit rate of the target data, effectively improves base In the accuracy for the result that the index is retrieved.It on the other hand, can be to different types of target by disclosed method Data establish the index file of unified form, to realize that the integrated retrieval based on different types of data provides technical support.
Other feature and advantage of the disclosure will be described in detail in subsequent specific embodiment part.
Description of the drawings
Attached drawing is for providing further understanding of the disclosure, and a part for constitution instruction, with following tool Body embodiment is used to explain the disclosure, but do not form the limitation to the disclosure together.In the accompanying drawings:
Fig. 1 is the flow chart of data directory construction method provided according to an embodiment of the present disclosure;
Fig. 2 is an example image;
Fig. 3 is that a kind of example for the index for building the target data according at least to primitive character word and inferencing aspects word is real The flow chart of existing mode;
Fig. 4 is the flow chart of the search method provided according to an embodiment of the present disclosure;
Fig. 5 is the block diagram of data directory construction device provided according to an embodiment of the present disclosure;
Fig. 6 is the block diagram of retrieval device provided according to an embodiment of the present disclosure;
Fig. 7 is the block diagram according to a kind of electronic equipment shown in an exemplary embodiment;
Fig. 8 is the block diagram according to a kind of electronic equipment shown in an exemplary embodiment.
Specific embodiment
The specific embodiment of the disclosure is described in detail below in conjunction with attached drawing.It should be understood that this place is retouched The specific embodiment stated is only used for describing and explaining the disclosure, is not limited to the disclosure.
Shown in Fig. 1, for the flow chart of data directory construction method provided according to an embodiment of the present disclosure, such as scheme Described in 1, the method includes:
In S11, the target data of index to be built is obtained.
Optionally, the target data can be any in image data, video data, audio data, text data Person.
In S12, the primitive character word of target data is determined.
When the target data is image data, the feature letter of following at least one type of described image data is determined Breath:Character features, features, color characteristic, affective characteristics, textural characteristics, shape feature, spatial position feature, and will obtain The primitive character word of the characteristic information got as described image data.
Illustratively, Fig. 2 is an example image.Image shown in Fig. 2 is inputted into deep learning convolutional neural networks model, to obtain Obtain the characteristic information of the image.Wherein, which is obtained by training study in advance.It is logical It crosses the neural network model image is identified, characteristic information " Einstein, black and white " can be obtained, by this feature information " Einstein, black and white " is determined as the primitive character word of the image.
When the target data is text data, following at least one processing is carried out to the text data:Information is taken out It takes, name Entity recognition and sentiment analysis, the primitive character word of the acquired results as the text data.
Illustratively, text data for " Dior is trendy, the autumn and winter trendy sweet crew neck of doll's money with Mao Lingmao overcoats housing, Yardage:SM, P330 " obtain characteristic information " Dior, Mao Ling, P330 " after carrying out keyword abstraction to this article notebook data, can incite somebody to action This feature information is determined as the primitive character word of this article notebook data.
When the target data is audio data, the audio data is converted into corresponding text data, and to institute It states text data and carries out following at least one processing:Information extraction, name Entity recognition and sentiment analysis, acquired results are as institute State the primitive character word of audio data.
Wherein, when target data is audio data, the audio data is converted into text data first, is obtained later former The method of beginning Feature Words is identical with text data, and details are not described herein.
When the target data is video data, for the image data included in the video data, determine described At least one of characteristic information of following at least one type of image data:Character features, features, color characteristic, Affective characteristics, textural characteristics, shape feature, spatial position feature;For the audio data included in the video data, by institute It states audio data and is converted to corresponding text data, and following at least one processing is carried out to the text data:Information extraction, Name Entity recognition and sentiment analysis, acquired results and the characteristic information that gets are together as the institute of the video data State primitive character word.
It wherein, can be by the image data in the video data and text data point when target data is video data It is not handled, to obtain the image data and the corresponding characteristic information of text data in the video data, and by image Data and the corresponding characteristic information of text data are determined as the primitive character word of video data.
Through the above technical solutions, it can be determined according to the type of target data using mode corresponding with the target data Primitive character word so as to ensure the accuracy of the primitive character word of target data, provides accurate data for structure index and supports.
In S13, related term extension is carried out to primitive character word, obtains extension feature word.
Example image as shown in Figure 2, can be by advance after determining its primitive character word for " Einstein, black and white " The related term model of acquisition is extended it.Illustratively, the extension obtained according to primitive character word " Einstein, black and white " is special It is " the theory of relativity, Germany, color " to levy word.Wherein, the related term which determines can according to primitive character word Correlation it is descending be ranked up, when the number of related term is excessive, then the phase can be set according to actual use situation The number of the related term of word model output is closed, for example, ten related term is as extension feature word before selection ranking.
In S14, primitive character word and extension feature word are input to knowledge mapping and made inferences, obtain inferencing aspects Word.
Knowledge mapping is substantially a kind of semantic network, which includes the side of node and connecting node.Wherein, node Entity or concept are represented, while the various semantic relations between representing entity/concept.Therefore, it is primitive character word and extension is special Sign word is input to after knowledge mapping, can obtain primitive character word and the associated word of extension feature word, and based on language Adopted relationship makes inferences, to obtain inferencing aspects word.
Connect above-mentioned example, according to primitive character word " Einstein, black and white " obtain extension feature word " the theory of relativity, Germany, After color ", primitive character word and extension feature word Input knowledge collection of illustrative plates are made inferences, to obtain inferencing aspects word.Illustratively, The inferencing aspects word of acquisition is " scientist, physics, talent, scholar, German, the theory of relativity, gray scale, mass-energy equation ".
In S15, the index of the target data is built according at least to primitive character word and inferencing aspects word.
It is the one kind for the index that the target data is built according at least to primitive character word and inferencing aspects word shown in Fig. 3 The flow chart of sample implementation, as shown in figure 3, including:
In S31, create the index file for target data, and by primitive character word and its corresponding characteristic type, And index file is written in inferencing aspects word and its corresponding characteristic type.Illustratively, which can be JSON The file of (JavaScript Object Notation, JS object tag) form.
It wherein,, can be by the way that Entity recognition be named to obtain when processing acquisition primitive character word is carried out to it for text data The primitive character word generic is obtained, and the entity class is determined as the corresponding characteristic type of primitive character word.For figure As data, characteristic information can be labeled, in training deep learning convolutional neural networks model so as to pass through the model The primitive character word of target data and its corresponding characteristic type can be obtained.For each original in audio data and video data The method of determination of the corresponding characteristic type of Feature Words is similar to aforesaid way, and details are not described herein.
It, can be true by the type of the inferencing aspects word in knowledge mapping for the inferencing aspects word obtained by knowledge mapping It is set to its corresponding characteristic type.It, can be according to reasoning spy if not marking the type of the inferencing aspects word in the knowledge mapping The context relation of sign word carries out prediction mark to the corresponding characteristic type of inferencing aspects word, wherein, the prediction notation methods For the prior art, details are not described herein.
Optionally, by the primitive character word and its corresponding characteristic type and the inferencing aspects word and its right The characteristic type answered is written before the index file, primitive character word and inferencing aspects word can be carried out according to characteristic type The hierarchical index of feature based type is established in classification.Illustratively, the primitive character word " Ai Yinsi of exemplary image data shown in Fig. 2 It is smooth " corresponding with inferencing aspects word " scientist, talent, scholar " characteristic type is all " character features ", then will " Einstein, Scientist, talent, scholar " is written to index file as an index record, and the data that can effectively reduce index file are superfluous It is remaining.
It is in S32, index file is associated with target data.
In the above-mentioned technical solutions, by by the primitive character word of target data and inferencing aspects word and its corresponding feature Type is also written in index file, later by the index file and target data association, can improve and the target data is retouched That states is comprehensive, so as to improve the precision of index.It is inputted in search condition, hit primitive character word and inferencing aspects word in user Any feature word when, can inquire the target data, it is possibility to have effect improve based on the index carry out effectiveness of retrieval And accuracy rate, promote user experience.
It optionally, can also will be described in the write-in of at least one of the data type, data source, ID of the target data Index file.
Wherein, the data type of target data can include text type, image type, video type, audio types etc. Deng.Data source is the store path of the target data, can be fast by data type, data source, ID of target data etc. Speed positions the target data.Illustratively, the mode that distributed storage may be used of different types of data is stored, according to number The storage region of the target data can be quickly determined according to type, improves the recall precision based on the index file.
Illustratively, the index file of example image shown in Fig. 2 is as follows:
Therefore, according to the prior art, the index established to example image shown in Fig. 2 includes keyword " Einstein ", When querying condition is " proposing relativistic scientist ", then according to when index is inquired in the prior art, then will not inquire To the image data.And when being inquired based on the index that disclosed method establishes the image data, then it can inquire this Image data can effectively improve the comprehensive of index, to improve the accuracy inquired based on the index.
In conclusion by the primitive character word of target data and the extension feature word obtained to primitive character word extension As the benchmark of knowledge mapping reasoning, the accuracy of knowledge mapping reasoning can be effectively improved, so that the target data obtained Inferencing aspects word is more comprehensive, target data can be described from multiple dimensions, makes the description to target data more accurate Really.Therefore, the primitive character word based on target data and inferencing aspects word build the index of the target data, on the one hand, can be with Ensure the comprehensive and accuracy of the index of the target data, improve the hit rate of the target data, effectively improve based on the rope Introduce the accuracy of the result of row retrieval.On the other hand, by disclosed method, different types of target data can be built The index file of vertical unified form, to realize that the integrated retrieval based on different types of data provides technical support.
The disclosure also provides a kind of search method, as shown in figure 4, the method includes:
In S41, search condition input by user is received, and determine that retrieval is intended to according to the search condition.
Wherein, the type of search condition input by user can be text type, image type, audio types etc..Show Example ground, can make inferences the search condition by knowledge mapping, to determine that retrieval is intended to.For example, when search condition is figure As type data when, which is made inferences, if can not obtain the search condition retrieval meaning Figure, can obtain the image data similar to the image data, and the similar image data is corresponding in knowledge mapping Retrieval is intended to be determined as the corresponding retrieval intention of the search condition.Wherein, it determines to retrieve according to search condition in knowledge mapping The mode of intention is the prior art, and details are not described herein.
In S42, according to the index of each data in retrieval intent query raw data base, obtain and meet the number that retrieval is intended to According to, wherein, the raw data base includes a plurality of types of data, and the index of each data is by above-mentioned data directory structure What construction method was built in advance.
Wherein, it after intention is retrieved in acquisition, can be made inferences to retrieval intention, with acquisition and retrieval intention pair The term and search rule answered.The type of data can include image type, text type, audio class in raw data base Type, video type etc..Due to data each in raw data base index for same form index, retrieved When, integrated retrieval is carried out in a plurality of types of data that can be in raw data base, obtains various types in raw data base Data under meet retrieval be intended to data, so as to fulfill across the integrated retrieval of media data.
In S43, according to the data for meeting retrieval intention, retrieval result is obtained.
Illustratively, can also will meet the data that retrieval is intended to extract, during so that retrieval result being fed back to user, The source of the retrieval result can together be fed back to user, be bonded the more use demands of user, promote user experience.
In one embodiment, after the data for meeting retrieval intention are obtained, coder-decoder can be based on (Encoder-Decoder) deep learning model handles it, can merge copy, retrieval and prediction in the model Etc. a variety of word acquisition models.
In order to obtain retrieval result, can by search condition with meet data encoding that retrieval is intended into vector, in order to Deep learning model uses.Illustratively, can using two-way RNN, (Recurrent Neural Networks recycle nerve net Network) search condition is encoded, memory network can be used to be encoded to meeting the data that retrieval is intended to.
In another embodiment, all one's life can be increased after coder-decoder into confrontation network model, the generation pair Anti- network model can include a generator and a discriminator, the spreading range of the retrieval result of generation be differentiated, with right The spreading range of retrieval result control effectively, and it is excessive and the unwanted retrieval result of user occur to avoid the occurrence of spreading range The phenomenon that, the accuracy of retrieval result is further improved, is bonded the use demand of user.
Wherein, the construction method of coder-decoder and generation confrontation network model is the prior art, no longer superfluous herein It states.
It in the above-mentioned technical solutions, can be original when determining that retrieval is intended to according to search condition input by user It is inquired in the index of a plurality of types of data in database, so as to realize the integrated retrieval across media data, extension The range of retrieval improves precision and the accuracy of retrieval.Meanwhile the data being intended to meeting retrieval are handled, then can be obtained The retrieval result obtained based on a plurality of types of data is obtained, further improves the quality of retrieval, promotes user experience.
The disclosure also provides a kind of data directory construction device, as shown in figure 5, described device 10 includes:
Acquisition module 101, for obtaining the target data of index to be built;
Determining module 102, for determining the primitive character word of the target data;
Expansion module 103 for carrying out related term extension to the primitive character word, obtains extension feature word;
Reasoning module 104, for the primitive character word and the extension feature word to be input to knowledge mapping and push away Reason obtains inferencing aspects word;
Module 105 is built, for building the number of targets according at least to the primitive character word and the inferencing aspects word According to index.
Optionally, the target data is any one of image data, video data, audio data, text data;
The determining module 102 includes:
First determination sub-module, for when the target data is image data, determining the following of described image data At least one of characteristic information of at least one type:Character features, features, color characteristic, affective characteristics, texture are special Sign, shape feature, spatial position feature, and using the characteristic information got as the original spy of described image data Levy word;
Second determination sub-module, for when the target data is text data, being carried out to the text data following At least one processing:Information extraction, name Entity recognition and sentiment analysis, the original of the acquired results as the text data Beginning Feature Words;
Third determination sub-module, for when the target data is audio data, the audio data to be converted to pair The text data answered, and following at least one processing is carried out to the text data:Information extraction, name Entity recognition and emotion Analysis, the primitive character word of the acquired results as the audio data;
4th determination sub-module, for when the target data is video data, for being included in the video data Image data, determine at least one of characteristic information of following at least one type of described image data:Character features, Features, color characteristic, affective characteristics, textural characteristics, shape feature, spatial position feature;For in the video data Comprising audio data, the audio data is converted into corresponding text data, and the text data is carried out with down toward A kind of few processing:Information extraction, name Entity recognition and sentiment analysis, acquired results and the characteristic information that gets are together The primitive character word as the video data.
Optionally, the structure module 105 includes:
First processing submodule, for creating the index file for the target data, and by the primitive character word And its index file is written in corresponding characteristic type and the inferencing aspects word and its corresponding characteristic type;
Submodule is associated with, for the index file is associated with the target data.
Optionally, the structure module 105 further includes:
Second processing submodule, at least one of data type, data source, ID of the target data to be write Enter the index file.
The disclosure also provides a kind of retrieval device, as shown in fig. 6, described device 20 includes:
Receiving module 201 for receiving search condition input by user, and determines retrieval meaning according to the search condition Figure;
Enquiry module 202 for the index according to each data in the retrieval intent query raw data base, is met The data that the retrieval is intended to, wherein, the raw data base includes a plurality of types of data, and the index of each data is logical Cross what above-mentioned data directory construction device was built in advance;
Processing module 203 for meeting the data that the retrieval is intended to according to, obtains retrieval result.
Fig. 7 is the block diagram according to a kind of electronic equipment 700 shown in an exemplary embodiment.As shown in fig. 7, the electronics is set Standby 700 can include:Processor 701, memory 702, multimedia component 703, input/output (I/O) interface 704, Yi Jitong Believe component 705.
Wherein, processor 701 is used to control the integrated operation of the electronic equipment 700, to complete above-mentioned data directory structure All or part of step in construction method or search method.Memory 702 is used to store various types of data to support at this The operation of electronic equipment 700, these data can for example include any application program for being operated on the electronic equipment 700 Or the instruction of method and the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video Etc..The memory 702 can be by any kind of volatibility or non-volatile memory device or combination thereof realization, example Such as static RAM (Static Random Access Memory, abbreviation SRAM), electrically erasable is read-only Memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), it is erasable Programmable read only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM) may be programmed read-only Memory (Programmable Read-Only Memory, abbreviation PROM), read-only memory (Read-Only Memory, letter Claim ROM), magnetic memory, flash memory, disk or CD.Multimedia component 703 can include screen and audio component.Its Middle screen for example can be touch screen, and audio component is for output and/or input audio signal.For example, audio component can wrap A microphone is included, microphone is used to receive external audio signal.The received audio signal can be further stored in and deposit Reservoir 702 is sent by communication component 705.Audio component further includes at least one loud speaker, for exports audio signal.I/ O Interface 704 provides interface between processor 701 and other interface modules, other above-mentioned interface modules can be keyboard, mouse Mark, button etc..These buttons can be virtual push button or entity button.Communication component 705 is for the electronic equipment 700 and its Wired or wireless communication is carried out between his equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of combination, therefore corresponding communication Component 705 can include:Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, electronic equipment 700 can be by one or more application application-specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part is realized, for performing above-mentioned data directory construction method or search method.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction, example are additionally provided Such as include the memory 702 of program instruction, above procedure instruction can be performed by the processor 701 of electronic equipment 700 in completion The data directory construction method or search method stated.
Fig. 8 is the block diagram according to a kind of electronic equipment 800 shown in an exemplary embodiment.For example, electronic equipment 800 can To be provided as a server.With reference to Fig. 8, electronic equipment 800 includes processor 822, and quantity can be one or more, with And memory 832, for storing the computer program that can be performed by processor 822.The computer program stored in memory 832 Can include it is one or more each correspond to the module of one group of instruction.In addition, processor 822 can be configured as The computer program is performed, to perform above-mentioned data directory construction method or search method.
In addition, electronic equipment 800 can also include power supply module 826 and communication component 850, which can be with It is configured as performing the power management of electronic equipment 800, which, which can be configured as, realizes electronic equipment 800 Communication, for example, wired or wireless communication.In addition, the electronic equipment 800 can also include input/output (I/O) interface 858.Electricity Sub- equipment 800 can be operated based on the operating system for being stored in memory 832, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM etc..
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction, example are additionally provided Such as include the memory 832 of program instruction, above procedure instruction can be performed by the processor 822 of electronic equipment 800 in completion The data directory construction method or search method stated.
The preferred embodiment of the disclosure is described in detail above in association with attached drawing, still, the disclosure is not limited to above-mentioned reality The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure Monotropic type, these simple variants belong to the protection domain of the disclosure.It is further to note that in above-mentioned specific embodiment Described in each particular technique feature, in the case of no contradiction, can be combined by any suitable means.For Unnecessary repetition is avoided, the disclosure no longer separately illustrates various combinations of possible ways.
In addition, arbitrary combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally Disclosed thought should equally be considered as disclosure disclosure of that.

Claims (10)

1. a kind of data directory construction method, which is characterized in that the method includes:
Obtain the target data of index to be built;
Determine the primitive character word of the target data;
Related term extension is carried out to the primitive character word, obtains extension feature word;
The primitive character word and the extension feature word are input to knowledge mapping and made inferences, obtains inferencing aspects word;
The index of the target data is built according at least to the primitive character word and the inferencing aspects word.
2. according to the method described in claim 1, it is characterized in that, the target data is image data, video data, audio Any one of data, text data;The primitive character word for determining the target data, including:
When the target data is image data, the characteristic information of following at least one type of described image data is determined: Character features, features, color characteristic, affective characteristics, textural characteristics, shape feature, spatial position feature, and will get The primitive character word of the characteristic information as described image data;
When the target data is text data, following at least one processing is carried out to the text data:Information extraction, life Name Entity recognition and sentiment analysis, the primitive character word of the acquired results as the text data;
When the target data is audio data, the audio data is converted into corresponding text data, and to the text Notebook data carries out following at least one processing:Information extraction, name Entity recognition and sentiment analysis, acquired results are as the sound The primitive character word of frequency evidence;
When the target data is video data, for the image data included in the video data, described image is determined The characteristic information of following at least one type of data:Character features, features, color characteristic, affective characteristics, texture are special Sign, shape feature, spatial position feature;For the audio data included in the video data, the audio data is converted For corresponding text data, and following at least one processing is carried out to the text data:Information extraction, name Entity recognition and Sentiment analysis, acquired results and the characteristic information that gets are together as the primitive character word of the video data.
It is 3. according to the method described in claim 1, it is characterized in that, described according at least to the primitive character word and the reasoning Feature Words build the index of the target data, including:
Create the index file for the target data, and by the primitive character word and its corresponding characteristic type and The index file is written in the inferencing aspects word and its corresponding characteristic type;
The index file is associated with the target data.
It is 4. according to the method described in claim 3, it is characterized in that, described according at least to the primitive character word and the reasoning Feature Words build the index of the target data, further include:
The index file is written at least one of data type, data source, ID by the target data.
5. a kind of search method, which is characterized in that the method includes:
Search condition input by user is received, and determines that retrieval is intended to according to the search condition;
According to the index of each data in the retrieval intent query raw data base, obtain and meet the data that the retrieval is intended to, Wherein, the raw data base includes a plurality of types of data, and the index of each data is by any in claim 1-4 What the data directory construction method described in was built in advance;
According to the data for meeting the retrieval and being intended to, retrieval result is obtained.
6. a kind of data directory construction device, which is characterized in that described device includes:
Acquisition module, for obtaining the target data of index to be built;
Determining module, for determining the primitive character word of the target data;
Expansion module for carrying out related term extension to the primitive character word, obtains extension feature word;
Reasoning module for the primitive character word and the extension feature word to be input to knowledge mapping and make inferences, obtains Inferencing aspects word;
Module is built, for building the rope of the target data according at least to the primitive character word and the inferencing aspects word Draw.
7. device according to claim 6, which is characterized in that the target data is image data, video data, audio Any one of data, text data;
The determining module includes:
First determination sub-module, for when the target data be image data when, determine described image data it is following at least At least one of a type of characteristic information:Character features, features, color characteristic, affective characteristics, textural characteristics, Shape feature, spatial position feature, and using the characteristic information got as the primitive character of described image data Word;
Second determination sub-module, for when the target data is text data, being carried out below at least to the text data A kind of processing:Information extraction, name Entity recognition and sentiment analysis, the original spy of the acquired results as the text data Levy word;
Third determination sub-module, for when the target data is audio data, the audio data being converted to corresponding Text data, and following at least one processing is carried out to the text data:Information extraction, name Entity recognition and emotion point Analysis, the primitive character word of the acquired results as the audio data;
4th determination sub-module, for when the target data be video data when, for the figure included in the video data As data, at least one of characteristic information of following at least one type of described image data is determined:Character features, things Feature, color characteristic, affective characteristics, textural characteristics, shape feature, spatial position feature;For being included in the video data Audio data, the audio data is converted into corresponding text data, and following at least one is carried out to the text data Kind processing:Information extraction, name Entity recognition and sentiment analysis, acquired results and the characteristic information conduct together got The primitive character word of the video data.
8. a kind of retrieval device, which is characterized in that described device includes:
Receiving module for receiving search condition input by user, and determines that retrieval is intended to according to the search condition;
Enquiry module, for the index according to each data in the retrieval intent query raw data base, acquisition meets the inspection The data of Suo Yitu, wherein, the raw data base includes a plurality of types of data, and the index of each data is to pass through right It is required that the data directory construction device described in 6 or 7 was built in advance;
Processing module for meeting the data that the retrieval is intended to according to, obtains retrieval result.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claim 1-5 the methods are realized during row.
10. a kind of electronic equipment, which is characterized in that including:
Computer readable storage medium described in claim 9;And
One or more processor, for performing the program in the computer readable storage medium.
CN201711461946.8A 2017-12-28 2017-12-28 Data directory construction method, search method and device, medium and electronic equipment Pending CN108255985A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711461946.8A CN108255985A (en) 2017-12-28 2017-12-28 Data directory construction method, search method and device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711461946.8A CN108255985A (en) 2017-12-28 2017-12-28 Data directory construction method, search method and device, medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN108255985A true CN108255985A (en) 2018-07-06

Family

ID=62724384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711461946.8A Pending CN108255985A (en) 2017-12-28 2017-12-28 Data directory construction method, search method and device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN108255985A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032734A (en) * 2019-03-18 2019-07-19 百度在线网络技术(北京)有限公司 Near synonym extension and generation confrontation network model training method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064969A (en) * 2012-12-31 2013-04-24 武汉传神信息技术有限公司 Method for automatically creating keyword index table
CN103886034A (en) * 2014-03-05 2014-06-25 北京百度网讯科技有限公司 Method and equipment for building indexes and matching inquiry input information of user
US20170124217A1 (en) * 2015-10-30 2017-05-04 International Business Machines Corporation System, method, and recording medium for knowledge graph augmentation through schema extension

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064969A (en) * 2012-12-31 2013-04-24 武汉传神信息技术有限公司 Method for automatically creating keyword index table
CN103886034A (en) * 2014-03-05 2014-06-25 北京百度网讯科技有限公司 Method and equipment for building indexes and matching inquiry input information of user
US20170124217A1 (en) * 2015-10-30 2017-05-04 International Business Machines Corporation System, method, and recording medium for knowledge graph augmentation through schema extension

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
**: "《cpc分类表》", 31 December 2015 *
郝林雪等: "摘要", 《计算机科学与探索》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032734A (en) * 2019-03-18 2019-07-19 百度在线网络技术(北京)有限公司 Near synonym extension and generation confrontation network model training method and device
CN110032734B (en) * 2019-03-18 2023-02-28 百度在线网络技术(北京)有限公司 Training method and device for similar meaning word expansion and generation of confrontation network model

Similar Documents

Publication Publication Date Title
Teney et al. Learning what makes a difference from counterfactual examples and gradient supervision
US10949744B2 (en) Recurrent neural network architectures which provide text describing images
CN111897964B (en) Text classification model training method, device, equipment and storage medium
US11550871B1 (en) Processing structured documents using convolutional neural networks
CN110796190A (en) Exponential modeling with deep learning features
CN109934249A (en) Data processing method, device, medium and calculating equipment
CN106973244A (en) Using it is Weakly supervised for image match somebody with somebody captions
US20220284327A1 (en) Resource pushing method and apparatus, device, and storage medium
Zhang et al. Discovering place-informative scenes and objects using social media photos
Zhang et al. Leveraging unlabeled data for emotion recognition with enhanced collaborative semi-supervised learning
CN108038103A (en) A kind of method, apparatus segmented to text sequence and electronic equipment
CN107066464A (en) Semantic Natural Language Vector Space
CN110019471A (en) Text is generated from structural data
CN107423398A (en) Exchange method, device, storage medium and computer equipment
CN109918684A (en) Model training method, interpretation method, relevant apparatus, equipment and storage medium
CN108304373A (en) Construction method, device, storage medium and the electronic device of semantic dictionary
CN109635080A (en) Acknowledgment strategy generation method and device
CN109656541A (en) Exploitative recommended method, device, storage medium and electronic equipment
CN111368525A (en) Information searching method, device, equipment and storage medium
CN108984555A (en) User Status is excavated and information recommendation method, device and equipment
CN108768824A (en) Information processing method and device
US9129216B1 (en) System, method and apparatus for computer aided association of relevant images with text
KR20190118108A (en) Electronic apparatus and controlling method thereof
CN109753275A (en) Recommended method, device, storage medium and the electronic equipment of Application Programming Interface
Pande et al. Development and deployment of a generative model-based framework for text to photorealistic image generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180706