CN110569448A

CN110569448A - Content labeling method and system

Info

Publication number: CN110569448A
Application number: CN201810468776.4A
Authority: CN
Inventors: 张运军
Original assignee: Shenzhen Double Monkey Technology Co Ltd
Current assignee: Shenzhen Double Monkey Technology Co Ltd
Priority date: 2018-05-16
Filing date: 2018-05-16
Publication date: 2019-12-13

Abstract

A content tagging system comprising: the system comprises terminal equipment and a cloud server; the terminal equipment is connected with the cloud server through a network. The terminal device includes: array microphone, vr physical camera, 3d interactive interface. The cloud server includes: speech recognition, NLP technology, neural network technology CNN, LSTM training content model and LBS accurate positioning search technology. The user only needs to use professional equipment to upload image, characters, pronunciation and handle the back permanent save at the high in the clouds, and someone utilizes the VR technique in the same position next time, obtains the information display that everybody left before here from the high in the clouds, avoids the travel of civilization to take place, through the virtual formation real environment that fuses the camera and shoot of VR, forms real and virtual stack and has arrived the message, and the sound is stayed, stays and shines in same position.

Description

Content labeling method and system

Technical Field

The embodiment of the invention relates to the technical field of information, in particular to a content labeling method and system.

Background

In recent years, LBS, NLP technology, search engine and artificial intelligence technology are widely applied to various fields of life, a user uploads four contents of images, characters and voice to a cloud end by using professional equipment, the cloud end stores the contents based on the LBS, and forms a big data bin by training a content model through neural network technology CNN and LSTM, the processed contents are stored in a data cluster, and various interfaces are provided for a third party to access.

The method mainly solves the problem that when people travel to a tourist attraction for tourism, some information of the tourist attraction is generally known through explanation of a guide or introduction of contents such as characters and the like of the tourist attraction. Such one-way information output sometimes does not facilitate the patron's full understanding of the tourist attraction. In addition, in some current situations, when tourists travel at scenic spots, the tourists often like to make a comment of making a symbol or a character on a building of the scenic spot to travel to the tourist and the like, which brings great damage to the building of the scenic spot.

disclosure of Invention

The technical problem mainly solved by the embodiment of the invention is to provide a content marking method and a content marking system, based on LBS positioning technology, algorithm, image recognition, NLP semantic analysis technology, search engine technology and vr virtual scene and real scene fitting technology, images, characters and voice are uploaded to a cloud server, a neural network technology is adopted to train a model to process the images, the voice, and semantic analysis is adopted to form a large database based on positions, semantics and a friendship chain, finally, the content to be marked is matched with the training model through the search engine technology and the deep neural network technology, and the content semantics is analyzed to obtain associated content and is matched with the marked content.

In order to solve the technical problems, the invention adopts a technical scheme that: there is provided a content marking system comprising: the system comprises terminal equipment and a cloud server; the terminal equipment is connected with the cloud server through a network.

The terminal device includes: array microphone, vr physical camera, 3d interactive interface.

The cloud server includes: speech recognition, NLP technology, neural network technology CNN, LSTM training content model and LBS accurate positioning search technology.

An annotation method, comprising: firstly, a professional terminal inputs voice or writes text or takes images with LBS information (three-dimensional coordinates X, Y and Z) wanted by a user through an array microphone, a cloud server recognizes the voice into characters by adopting voice recognition, analyzes the characters by using an NLP technology, and then trains a content model by using an LSTM and stores data into a database cluster.

Secondly, a professional terminal and a vr physical camera are used for photographing, LBS information is taken, the LBS information is reported to a cloud server, the cloud server adopts a neural network technology CNN and LBS accurate positioning search technology, and all information which is reserved in the same direction (three-dimensional coordinates X, Y and Z) and the same scene is inquired;

And finally, displaying the data returned by the cloud on a professional terminal device by using a 3d interactive interface, so that the user can finally see how many text messages, how many voice messages and how many images are left.

The user only needs to use professional equipment with the image, the characters, the pronunciation upload the high in the clouds processing back permanent storage, next time someone uses professional equipment to utilize the VR technique in the same position, obtain the information display that everybody left before here from the high in the clouds, avoid the emergence of uneventful tourism, and let "this trip" permanent retention, simultaneously through picture recognition technology, speech recognition technology, characters semantic analysis technique, the location technique, we carry out the deep analysis to visitor's content, match close relevant content, and form real environment through the virtual integration of VR to the camera formation of shooing, form real and virtual stack and arrived the message, the sound is stayed, stay and shine in same position.

Drawings

Fig. 1 is a block diagram of a content tagging system according to an embodiment of the present invention.

Detailed Description

In order to facilitate an understanding of the invention, the invention is described in more detail below with reference to the accompanying drawings and detailed description. It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for descriptive purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

As shown in fig. 1, a content tagging system includes: the system comprises terminal equipment and a cloud server; the terminal equipment is connected with the cloud server through a network.

A user uses a handheld terminal device, a VR camera technology is used, the current LBS three-dimensional coordinate (X, Y, Z) and an image are uploaded to a cloud server, the server can perform image identification according to the LBS three-dimensional coordinate (X, Y, Z) and the image to confirm the position and the direction, a neural network algorithm is adopted to match the content in a database cluster and return the result set comprising the image, characters and voice to the terminal device, the terminal utilizes the VR technology to superpose the result with a real scene obtained by a real camera to form the same place and the same direction, associated voice is generated, the problem of the left image is solved, and the left image is used for really realizing all sounds, images and characters of the game to be permanently retained and shared to a person who wants to see.

The key to NLP technology processing natural language is to let computers "understand" natural language, so natural language processing is also called natural language understanding. Three levels of NLP analysis techniques:

The NLP analysis technique is roughly divided into three levels: lexical analysis, syntactic analysis, and semantic analysis.

1) Lexical analysis

lexical analysis includes word segmentation, part-of-speech tagging, named entity recognition, and word sense disambiguation.

Word segmentation and part-of-speech tagging are well understood.

Named entity recognition is the task of recognizing named entities such as person names, place names, and organization names in sentences. Each named entity is made up of one or more terms.

Word sense disambiguation is the determination of the true meaning of each or some of the words based on the context of the sentence.

2) Syntactic analysis

The syntactic analysis is to change an input sentence from a sequence form into a tree structure, so that collocation or modification relations among words in the sentence can be captured, and the step is a key step in NLP.

There are two mainstream syntactic analysis methods in the research community at present: phrase structure syntax system, dependency structure syntax system. Where dependency syntax systems have now become a hotspot in studying syntactic analysis.

the dependency grammar has a simple representation form, is easy to understand and label, and can easily represent semantic relations among words, for example, relations such as affairs, time and the like can be formed among sentence components. The semantic relation can be conveniently applied to the aspects of fish semantic analysis, information extraction and the like. Dependencies may also enable more efficient implementation of decoding algorithms.

The syntactic structure obtained by syntactic analysis can help the semantic analysis of the upper layer and some applications, such as machine translation, question answering, text mining, information retrieval and the like.

3) Semantic analysis

The ultimate goal of semantic analysis is to understand the true semantics of a sentence expression. What form to represent semantics at that time has not been well resolved. Semantic role labeling is a relatively mature shallow semantic analysis technique. Given a predicate in a sentence, the task of semantic role labeling is to label the parameters of the predicate, such as the implementation, the story, the time, the location, etc., from the sentence. Semantic role labeling is generally completed on the basis of syntactic analysis, and the syntactic structure is crucial to the performance of semantic role labeling.

The neural network technique CNN is a feedforward neural network whose artificial neurons can respond to a part of the surrounding cells within the coverage range, and has an excellent performance for large-scale image processing. It includes a convolutional layer (convolutional layer) and a pooling layer (Pooling layer).

The basic structure of CNN includes two layers, one of which is a feature extraction layer, and the input of each neuron is connected to a local acceptance domain of the previous layer and extracts the feature of the local. Once the local feature is extracted, the position relation between the local feature and other features is determined; the other is a feature mapping layer, each calculation layer of the network is composed of a plurality of feature mappings, each feature mapping is a plane, and the weights of all neurons on the plane are equal. The feature mapping structure adopts a sigmoid function with small influence function kernel as an activation function of the convolution network, so that the feature mapping has displacement invariance. In addition, since the neurons on one mapping surface share the weight, the number of free parameters of the network is reduced. Each convolutional layer in the convolutional neural network is followed by a computation layer for local averaging and quadratic extraction, which reduces the feature resolution.

CNN is used primarily to identify two-dimensional graphs of displacement, scaling and other forms of distortion invariance. Since the feature detection layer of CNN learns from the training data, explicit feature extraction is avoided when CNN is used, while learning from the training data is implicit; moreover, because the weights of the neurons on the same feature mapping surface are the same, the network can learn in parallel, which is also a great advantage of the convolutional network relative to the network in which the neurons are connected with each other. The convolution neural network has unique superiority in the aspects of voice recognition and image processing by virtue of a special structure with shared local weight, the layout of the convolution neural network is closer to that of an actual biological neural network, the complexity of the network is reduced by virtue of weight sharing, and particularly, the complexity of data reconstruction in the processes of feature extraction and classification is avoided by virtue of the characteristic that an image of a multi-dimensional input vector can be directly input into the network.

The LSTM training content model is a long-short term memory network, is a time recursive neural network, and is suitable for processing and predicting important events with relatively long intervals and delays in a time sequence.

LSTM has found many applications in the scientific field. LSTM based systems may learn tasks such as translating languages, controlling robots, image analysis, document summarization, speech recognition image recognition, handwriting recognition, controlling chat robots, predicting diseases, click rates and stocks, synthesizing music, and so forth.

The LBS accurate positioning search technology is a location-based service, which is a value-added service that acquires the location Information (geographical coordinates or geodetic coordinates) of a mobile terminal user through a radio communication network (such as a GSM network and a CDMA network) of a telecommunication mobile operator or an external positioning mode (such as a GPS), and provides corresponding services for the user under the support of a Geographic Information System (GIS) platform.

A labeling method is realized based on the content marking system, and specifically comprises the following steps: firstly, a professional terminal inputs voice or writes text or takes images with LBS information (three-dimensional coordinates X, Y and Z) wanted by a user through an array microphone, a cloud server recognizes the voice into characters by adopting voice recognition, analyzes the characters by using an NLP technology, and then trains a content model by using an LSTM and stores data into a database cluster.

And finally, data returned by the cloud end can be presented by using a 3d interactive interface on professional terminal equipment, and a user can finally see how many text messages, how many voice messages and how many images are left, so that the game is really realized.

the user only needs to use professional equipment to upload images, characters and voice to the cloud end for permanent storage after processing, next time, someone uses professional equipment to utilize VR technique in the same position, obtain the information that everybody left before here from the cloud end and show, avoid the emergence of uneventful tourism, and let "stay to this trip" permanent storage, simultaneously through picture recognition technology, voice recognition technology, characters semantic analysis technique, the location technology, we carry out the deep analysis to visitor's content, match close relevant content, and form real environment through the virtual integration that shoots of VR, form real and virtual stack and arrived the message, the sound is stayed, stay and shine in same position.

Embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method as described above.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

It should be noted that the description of the present invention and the accompanying drawings illustrate preferred embodiments of the present invention, but the present invention may be embodied in many different forms and is not limited to the embodiments described in the present specification, which are provided as additional limitations to the present invention, and the present invention is provided for understanding the present disclosure more fully. Furthermore, the above-mentioned technical features are combined with each other to form various embodiments which are not listed above, and all of them are regarded as the scope of the present invention described in the specification; further, modifications and variations will occur to those skilled in the art in light of the foregoing description, and it is intended to cover all such modifications and variations as fall within the true spirit and scope of the invention as defined by the appended claims.

Claims

1. A content tagging system, comprising: the system comprises terminal equipment and a cloud server; the terminal equipment is connected with the cloud server through a network;

The terminal device includes: the array microphone, the vr physical camera and the 3d interactive interface;

2. a method of labeling, comprising: firstly, a professional terminal inputs voice or writes text or takes images and brings LBS information through an array microphone, a cloud server recognizes the voice into characters by adopting voice recognition, analyzes the characters by using an NLP technology, and then trains a content model by using an LSTM and stores data into a database cluster;

Secondly, using a professional terminal and a vr physical camera to take a picture and bring LBS information, reporting to a cloud server, and inquiring all information left in the same direction and the same scene by the cloud through a neural network technology CNN and LBS accurate positioning search technology;

And finally, displaying the data returned by the cloud on the professional terminal equipment by using a 3d interactive interface.

3. The method of claim 2, wherein the content presented by the 3d interactive interface is a text message, a voice message or an image.