CN112347034A - Multifunctional integrated system-on-chip for nursing old people - Google Patents

Multifunctional integrated system-on-chip for nursing old people Download PDF

Info

Publication number
CN112347034A
CN112347034A CN202011391696.7A CN202011391696A CN112347034A CN 112347034 A CN112347034 A CN 112347034A CN 202011391696 A CN202011391696 A CN 202011391696A CN 112347034 A CN112347034 A CN 112347034A
Authority
CN
China
Prior art keywords
module
data
dish
cnn
acceleration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011391696.7A
Other languages
Chinese (zh)
Inventor
张延军
黄百铖
卢继华
蔺彦儒
林少越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202011391696.7A priority Critical patent/CN112347034A/en
Publication of CN112347034A publication Critical patent/CN112347034A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • G06F8/24Object-oriented
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Geometry (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a multifunctional integrated system on a chip for nursing old people, and belongs to the technical field of SOC (system on chip) embedded development, deep learning and computer vision. The system realizes gesture recognition, food clamping and feeding and tumble rescue in a ZYNQ series development board by designing and controlling an Ali cloud server through a circuit in block design of vivado, and specifically comprises three parts, namely 1) gesture recognition is realized through image preprocessing and SVM prediction; 2) performing CNN classification prediction on the dish through voice recognition, dish target candidate frame extraction and face recognition, and further realizing dish clamping and feeding; through fall detection and trolley control, binocular vision ranging is completed, and fall rescue is realized. The system has the advantages that the circuit design flexibility is reduced, the storage resources are reduced, the calculation resources are reduced, the accuracy is guaranteed, the speed is improved, comprehensive road condition information is provided for a user, meanwhile, the real-time performance is good, the actual use requirement is met, the occupied volume of a product is reduced, and the cost is low.

Description

Multifunctional integrated system-on-chip for nursing old people
Technical Field
The invention relates to a multifunctional integrated system on a chip for nursing old people, and belongs to the technical field of SOC (system on chip) embedded development, deep learning and computer vision.
Background
The aging of population is a basic national situation in China. Chinese development report 2020, published by the chinese development foundation: the development trend and policy of the aging of the Chinese population indicate that the aging degree of the Chinese population is continuously deepened after the Chinese population enters the aging society in 2000 years. By 2022 years or so, the population over 65 years old in China will account for 14% of the total population, and the transition to the aging society is realized. Therefore, the aging problem becomes an increasingly important issue in our society.
The disclosed nursing products or inventions are mainly divided into a large nursing system and a small nursing tool. Aiming at a large-scale nursing system for the aged, the nursing system is mainly inserted into some special occasions, such as nursing homes and old people disease hospitals. The nursing system mainly comprises various types of sensors, and although the concept of the Internet of things is adopted, the system only receives, counts and analyzes acquired information, but has no way of real-time intelligent feedback. Moreover, the products can only be used in large occasions, and have no way to meet the requirements of the old at home, so that the application occasions are limited. Because the product needs to be deployed in a large space, the maintenance and the updating are difficult, and the cost for installation and maintenance is high. For small nursing tools, while this type of tool can accomplish some specific nursing problems, it faces real-time, cost, and practicality issues. The intelligent feeding system of the currently disclosed Emett technology is sold at $ 4500 (which is equivalent to 30994 RMB), which is not fit for the daily consumption level of most people. In addition, the intelligent feeding system only moves the mechanical arm to a fixed position, and has no way of intelligently identifying dishes, positions of the dishes and positions of human mouths, so that the intelligent feeding system is lack of flexibility in use. For the gesture recognition system disclosed at present, no matter the traditional SVM method or the CNN method is adopted, the gesture recognition system is realized by a CPU (central processing unit), only the gesture is recognized, the processing after the gesture recognition is not carried out, and an aged care product is not formed. The gesture recognition method implemented in the CPU fails to accelerate the FPGA for gesture prediction, and thus it is difficult to meet the real-time requirement. Most of the current fall detection methods adopt a wearable identification method, the old people are required to wear equipment with built-in attitude sensors, the use process is limited by whether the old people wear the equipment, the intelligent degree is low, and the wearing action of each use brings troubles and inconvenience to the old people. The fall detection accuracy by adopting a non-traditional method, namely a computer vision method, is higher, but the method has high complexity, is difficult to meet the real-time requirement, and does not give feedback after the fall detection, so that the practical requirement of nursing for the aged cannot be met. One of the core problems faced by the action of the existing household robot at present is obstacle avoidance, and the precision of the optical flow or ultrasonic sensor adopted in the existing disclosed product and the invention is low, so that no method can meet the requirement of accurate distance judgment under the high-speed condition. All the small-sized nursing tools are isolated systems one by one and are not integrated into a multifunctional nursing system, so that the use limitation is caused.
Disclosure of Invention
The invention aims to solve the problems of limited application occasions, high cost, difficult maintenance, low real-time property, low intelligent degree, low accuracy, scattered functions and the like of the conventional large and small nursing systems for the aged.
The core idea of the invention is as follows: through circuit design in block design of vivado and control of an Ali cloud server, gesture recognition, food clamping and feeding and tumble rescue are realized in a ZYNQ series development board, and the three parts are specifically that 1) gesture recognition is realized through image preprocessing and SVM prediction; 2) performing CNN classification prediction on the dish through voice recognition, dish target candidate frame extraction and face recognition, and further realizing dish clamping and feeding; through fall detection and trolley control, binocular vision ranging is completed, and fall rescue is realized.
The multifunctional integrated system-on-chip comprises an embedded system-on-chip and an external hardware module; the embedded system on chip comprises a food clamping and feeding system, a gesture recognition system and a tumble rescue system; the external hardware module comprises a microphone, a trolley, an HDMI display, a mechanical arm, a PC end and a binocular camera;
the PC end is connected with the Ali cloud server;
the dish clamping and feeding system comprises a dish classification module, a face recognition module, a voice recognition module, an RAM buffer, a CNN acceleration library, an AXI interface control module and a storage management module; the dish classification module comprises a CNN parameter presetting sub-module, a candidate frame extracting sub-module, a CNN frame constructing sub-module and a mechanical arm pose calculating sub-module; the face recognition module comprises a mouth recognition sub-module and a mechanical arm pose control sub-module; the RAM buffer comprises an operation result storage unit, a test data storage unit, a convolution kernel and bias storage unit and a full-connection layer weight bias storage unit; the CNN acceleration library comprises a convolution acceleration module, a pooling acceleration module and a full-connection acceleration module;
the gesture recognition system comprises a result discrimination module, an SVM overlay scheduling module, a floating point number to fixed point number unit, a Fourier operator extraction module, an outline extraction module, a skin color detection module, an input vector blocking transmission module, a support vector memory, a bias memory, a pulse array structure, an input vector memory, an AXIS bus control module and a kernel function accumulation module; the systolic array structure comprises NPEA PE module; the PE module comprises a fixed point number multiplication submodule, a fixed point number addition submodule, a ROM address management submodule and an output data management submodule;
wherein N isPEThe maximum value of (a) depends on the fan-in energy of the kernel function accumulation moduleForce;
the tumble rescue system comprises a Gaussian filtering and graying module, a canny edge extraction module, a distance information generation module, a tumble detection module, an image information fusion display module, a video stream management module, an image texture feature enhancement module, an SAD window traversal search module, a high-quality matching point screening module, a parallax map distance measurement module, a tiier-yolo obstacle detection module, a cv2pynq hardware acceleration module and a sobel filtering module.
The functions of each module in the multifunctional integrated system on chip are as follows:
the gesture recognition system adopts a ZYNQ development board to carry out software and hardware collaborative design to complete acceleration of the gesture recognition function; the SVM test vector of the gesture recognition system is stored in an input vector memory through an AXIS bus control module and an input vector blocking transmission module, and is repeatedly taken out and transmitted to N in a pulse array structurePEA first one of the PE modules; the PE module sequentially transmits the test vectors to the next PE module on the rising edge of the clock, takes out the support vectors from the support vector memory, performs dot product operation on the support vectors and the test vectors, and outputs the support vectors and the test vectors to the kernel function accumulation module; and the kernel function accumulation module takes out the offset value from the offset memory and adds the offset value and the accumulated value through counting and accumulating to a numerical value Nsvn through an internal counter, and transmits a result to a result judgment module of the gesture recognition system.
Wherein N isPEThe maximum value of (a) depends on the fan-in capability of the kernel function accumulation module; nsvn is the number of support vectors corresponding to each classification category in the gesture recognition SVM training, and the range of the number depends on the total amount of training samples;
the connection mode of the gesture recognition system module is as follows:
the system comprises a camera, an outline extraction module, an SVM overlay scheduling module, a result judgment module and a PC terminal, wherein the camera is connected with a skin color detection module, the skin color detection module is connected with the outline extraction module, the outline extraction module is connected with the Fourier operator extraction module, the Fourier operator extraction module is connected with a floating point number to fixed point number unit, the SVM overlay scheduling module is connected with the result judgment module, and the result judgment module is connected with the PC terminal connected with Aliyun; the AXIS bus control module receives the test vector transmitted by the SVM overlay scheduling module, and the input vector blocks the transmission module and the AXIS busThe line control module is connected with the input vector memory and controls the writing time sequence of the test vector to the input vector memory; the PE module in the systolic array structure is connected with the input vector blocking transmission module, the support vector memory and the kernel function accumulation module, and the test vector and the support vector carry out kernel function K (v) in a plurality of PE modulestest,vsup) Computing and then transforming the kernel function K (v)test,vsup) The operation result is transmitted to a kernel function accumulation module; the kernel function accumulation module is connected with the offset memory and the AXIS bus control module, and takes an offset value from the offset memory and adds the offset value and the accumulated value to be used as a result and then transmits the result to the result discrimination module of the gesture recognition system;
wherein, K (v)test,vsup) For SVM kernel operation, vtestIs a test vector, vsupThe method comprises the following steps of (1) adopting a linear kernel function, a Gaussian kernel function and a polynomial kernel function as support vectors;
the pulse array structure is composed of NPEEach PE module corresponds to one support vector memory, and each support vector memory stores different support vectors; all PE modules are functionally consistent, including: receiving a test vector, transmitting the test vector, taking out a support vector from a corresponding support vector memory, completing SVM kernel function operation, and outputting a kernel function operation result of a single test vector and a single support vector;
the AXIS bus control module controls four core signal lines, wherein the core signal lines are tdata, tvalid, tready and tlast signal lines in an AXIS bus respectively;
the input vector blocking transmission module is used for judging whether the input vector memory finishes processing and transmitting the data transmitted into the embedded asynchronous data cache submodule by the SVMoverlay scheduling module, if so, transmitting new data, and otherwise, stopping transmission;
and a counter is arranged in the kernel function accumulation module and counts and accumulates the data received from the systolic array structure. When the number value Nsvn is counted, taking out the offset from the offset memory and adding and outputting the accumulation result;
wherein Nsvn comprises Nsv0 Nsv1 Nsv2.... Nsvn, the size of n depends on the strategy of multi-classification of SVM and the number of classification categories, and the range of Nsvn depends on the total amount of training samples;
the vegetable clamping and feeding system adopts a ZYNQ development board to carry out software and hardware collaborative design to complete the high-speed, accurate and full-automatic vegetable clamping and feeding function;
the dish clamping and feeding system identifies the name of a dish described by a user through a microphone through a voice identification method and judges whether the name of the dish is legal or not according to a dish database, identifies the outline of each dish containing area in a picture acquired by a camera as a dish target candidate frame by using the shape characteristic of each dish containing area in a dish through a Yolo improved candidate frame extraction method, and transmits pictures in all the target candidate frames to a CNN acceleration library one by one to finish CNN acceleration operation. And clamping the dishes with the legal dish names of the users by using the mechanical arm according to the CNN classification result. And finding out the relative position of the mouth in the face in the picture of the camera through face recognition, and finishing feeding by using the mechanical arm.
The connection mode of the food clamping and feeding system module is as follows:
the voice recognition module is connected with the dish classification module and transmits the dish information selected by the user; the AXI interface control module is connected with the dish classification module and reads the CNN preset data, the CNN user configuration information and the test data. The memory management module is connected with the AXI interface control module and the RAM buffer, and is used for storing corresponding units in the RAM buffer with preset data or test data according to the configuration parameters received in the AXI interface control module and enabling the modules in the CNN acceleration library to carry out operation. The CNN acceleration library module is connected with the storage management unit and the RAM buffer, and enables the acceleration operation module to fetch data from the RAM buffer to complete operation according to the enabling information and transmit the result to the RAM buffer. The RAM buffer is connected with the AXI interface control module and the storage management module, and under the control of the storage management module, the operation result is transmitted to the AXI interface control module and then to the dish classification module.
The convolution acceleration module extracts data with the convolution kernel size defined by a user in parameter configuration from the parameter RAM according to parameter configuration information in the parameter RAM by a method of traversing scanning data, extracts window data of a corresponding address from the input data RAM to complete window scanning convolution, performs convolution acceleration by using a corresponding instruction in the HLS, and stores the result to an operation result storage unit;
the pooling acceleration module fetches data from a corresponding address of the input data RAM according to the definitions of the pooling mode, the size of a pooling window and the stepping data of the pooling window in the parameter configuration information to complete pooling, performs pooling acceleration by using a HLS corresponding instruction, and stores the result to an operation result storage unit;
the full-connection acceleration module extracts the weight and the offset value of the full-connection layer from the parameter RAM according to the input dimension and the output dimension in the parameter configuration information, completes full-connection multiply-add operation with input data of a corresponding address in the input data RAM, accelerates the multiply-add operation by utilizing an HLS corresponding instruction, and stores the result to the operation result storage unit.
Wherein the AXI interface module controls an AXI-lite bus and an AXI-Stream bus. The AXI-lite bus transmits CNN parameter configuration information, and the AXI-Stream transmits mass data to be operated.
And the storage management module finishes the control of the storage unit according to the configuration information in the AXI-lite bus and the operation ending mark information in the CNN acceleration library.
The tumble rescue system adopts a ZYNQ series development board to carry out software and hardware collaborative design to complete a quick and accurate tumble rescue function;
the tumble rescue system obtains a target position area by adopting an adjacent frame background difference method and a morphological method for a video stream read by a camera, and detects a tumble according to the change of the outline of the target area, and a trolley carrying a rescue article automatically opens to a tumble person. The binocular camera of the car head receives the left view and the right view, the corresponding relation between the two images is sought through a stereo matching method, namely corresponding points on the left image and the right image are matched, a disparity map is generated, and depth of field information is calculated in real time according to the disparity map. The trolley stops when reaching a certain distance in front of the fallen person to complete rescue.
The falling rescue system module is connected in the following way:
the video stream management module is connected with the sobel filtering module, the image texture feature enhancing module, the tiier-yolo obstacle detecting module, the cv2pynq hardware accelerating module and the Gaussian filtering and graying module, and transmits video data to the connecting blocks respectively to perform corresponding functions. The sobel filtering module is connected with the canny edge extraction module and used for preprocessing the tumble judging image. The distance information generation module is connected with the falling detection module and transmits the distance information from the falling person to the trolley. The image texture feature enhancement module, the SAD window traversal search module, the high-quality matching point screening module and the disparity map distance measurement module are sequentially connected in series, stereoscopic matching of left and right attempts of the binocular camera is completed, and the obtained disparity map is transmitted to the distance information generation module. the tinier-yolo obstacle detection module is connected with the image information fusion display module and transmits the position information of the yolo detection target frame and the target discrimination probability.
And the video stream management module, namely the VDMA module respectively transmits the video streams to the sobel filtering module, the image texture feature enhancing module, the tie-yolo obstacle detecting module and the cv2pynq hardware accelerating module according to the configuration information.
The image texture feature enhancement module, the SAD window traversal search module, the high-quality matching point screening module and the parallax map distance measurement module are used for completing left and right view feature point matching together and providing a parallax map for image depth detection.
And the tinier-yolo module loads a pre-trained BNN model to detect the position and the type probability of the barrier, and the target area is determined by an NMS non-maximum value inhibition method.
Wherein the cv2pynq module accelerates the opencv function, and wherein the sobel filter module accelerates the sobel filter function.
The working process of the multifunctional integrated system on chip comprises the following steps:
s1: a user sits in front of the PC intelligent control terminal, selects one of the three functions of 'food clamping and feeding', 'tumble rescue' and 'gesture recognition' for execution, and sends selected function information to the multifunctional integrated hardware system through the Aliskiren cloud; if the gesture recognition is selected, the state of S2 is entered; if the food is selected, entering the state of S3; if the 'tumble rescue' is selected, the state of S4 is entered;
s2: the method executes the gesture recognition function, and comprises the following specific steps:
step A), loading a user-defined operation circuit and reading a collected video frame through a camera;
step B) converting the video frame from an RGB color space to a YCrCb color space;
step C), splitting the value of the YCrCb color space, extracting only the Cr value, and performing Gaussian filtering on the video frame by using a convolution kernel;
step D) carrying out skin color extraction processing on the filtered Cr channel value based on a threshold value;
step E), carrying out corrosion operation on the picture after the skin color extraction processing by using a convolution core;
step F), performing expansion operation on the picture subjected to the corrosion operation by using a convolution kernel;
step G), after edge extraction is carried out on the expanded picture, binarization is carried out;
step H), extracting the picture outline;
step I), creating a coordinate system on the picture to obtain a contour function;
step J), carrying out Fourier transform on the obtained contour function to obtain a Fourier coefficient after the Fourier transform;
step K), intercepting W coefficients positioned in the positive coefficient to obtain a Fourier descriptor of the picture;
wherein W ranges from 20 to 100;
step L) taking a module of the W Fourier descriptors intercepted in the step K) as a vector to be tested
Step M) converting elements in the vector to be tested into a floating point number form of a Q bit fixed point number with a decimal bit width of N;
step N), converting the floating point number form into a binary form, and inputting the binary form into a user-defined SVM operation circuit to complete SVM acceleration operation;
step O) of accelerating operation of SVM, and then obtaining the obtained NtypeComparing the sizes of the result arrays of the elements, wherein the number with the largest result corresponds toThe array subscripts are the gestures to be distinguished.
Wherein N istypeThe number of classes is classified into multiple types of SVM.
And step P) transmitting the gesture to the intelligent control terminal through the Aliyun.
Step Q) returns to state S1.
S3: the method executes the function of 'clamping vegetables and feeding', and comprises the following specific steps:
step A), a user sits in a fixed chair, speaks the name of the dish through a microphone, and if the name of the dish is in a database, an upper computer displays the name of the dish and executes step B; if the dish name is not in the database, the upper computer displays that the dish name is called to be spoken again, and the step A is returned;
step B), shooting a customized dish by a camera, and performing CNN classification on dishes in the candidate frame according to a designed improved Yolo method, wherein the steps are as follows:
step Ba) loading bit stream files of the CNN classified circuits generated in advance;
step Bb) loading the CNN parameter file to the PL terminal;
and Bc) setting configuration information of each FPGA acceleration operation module of the CNN operation library at the PL end by the user.
Step Bd) calling a PYNQ camera drive to acquire a video frame;
step Be) converting the video frame into a gray color domain, and then performing Gaussian filtering, Canny descriptor edge extraction, corrosion operation and expansion operation;
and a step Bf) of finding all quadrangles with the area value S in the video frame as dish candidate frame extraction areas.
Wherein S is a measurement area value returned by the opencv function library, and the range of S is 5000-20000.
The Bh) reforms the candidate quadrilateral frame selection extraction area in the previous step into a regular rectangular picture;
step Bi) reforming all the regular rectangular pictures in the previous step into specific rectangular size W x H;
wherein, W and H are the length and width of the picture input by the trained network respectively.
Step Bj) transmitting the picture of the previous step to a convolution acceleration module in a PL terminal;
step Bk) returning the convolution acceleration module of the PL end to the array of the PS end and inputting the array into a Relu activation function and a pooling acceleration module in the PL end;
step Bl) transmitting the array returned by the pooling acceleration module at the PL end to the PS end to a convolution acceleration module in the PL end;
step Bm) returns the convolution acceleration module at the PL end to the array at the PS end and then inputs the array into a Relu activation function and a pooling module in the PL end,
step Bn) returns the array of the PS end to the pooling layer of the PL end and transmits the array to a full-connection layer acceleration module in the PL end, and the output dimension is the dish classification number Nc
Step Bo) returns the dimension of the PL terminal to be NcThe array carries out classification and judgment to find out the dish classification result corresponding to the input picture;
step C), after judging the types of all dishes in the dish, judging the relative positions of the dishes in the picture by the system according to the recorded effective dish names spoken by the user;
step D) calculating the D of the mechanical arm according to the relative position by the systemmThe direction and the distance of the joint with the freedom of motion, which need to move, are moved to the position of the dish corresponding to the dish to spoon the dish;
step E), the mechanical arm rotates to a position where the camera can shoot a picture which is 50cm above the fixed chair and contains a human face, and the picture is collected;
step F), extracting the face characteristic points in the picture by the system to obtain the relative position P of the mouth in the picture;
step G), the system calculates the D of the mechanical arm according to the relative position PmThe direction and distance of the joint with the freedom of motion, which needs to move, is moved to the position of the mouth of the person to finish feeding.
Step I) returns to S1;
s4: the method executes the 'tumble rescue' function and comprises the following specific steps:
step A), using a filtering template to perform Gaussian filtering on an image so as to eliminate noise;
step B), converting three color channels of the RGB image into a single-channel gray image for subsequent processing;
step C) convolving the image by a sobel operator to obtain an area with severe image brightness change in the x direction or the y direction as an initial detection edge, and further refining the rough edge given by the sobel method by utilizing a Canny method through edge information in the dx and dy directions and methods such as non-maximum suppression, double-threshold detection and the like to obtain accurate edge information;
step D) performing background subtraction by using a self-adaptive mixed Gaussian background modeling method, and extracting a dynamic contour from a background difference;
and E) performing morphological logic operation of expansion corrosion on the region corresponding to the binary image at each pixel position, and taking the logic operation result as the corresponding pixel of the output image to obtain the complete dynamic person position.
And F) obtaining one or more solid monitoring areas with irregular shapes after morphological operation, and obtaining a standard detection area by searching for four-way or eight-way areas.
Step G) calculating the length-width ratio of the contour of the detection area to judge whether the character falls down, wherein the length-width ratio of the character is smaller than a threshold value TaWhen the target is in normal walking, the target is considered to be in normal walking; greater than a threshold value TbJudging whether the person falls down and needing intervention;
step H), automatically opening the trolley carrying the medicine and the distress call device to a position 40-50 cm in front of the tumble through a PL-end binocular vision distance measurement method, and stopping the trolley to complete rescue;
step I) returns to state S1.
The hardware connection mode of the system is as follows:
the ZYNQ-series FPGA development board is connected with an HDMI display through an HDMI line, connected with a microphone through an earphone line, connected with a binocular camera through a usb line, and connected with four steering engines and four motors through DuPont lines respectively. Wherein, the steering engine drives the mechanical arm to move, and the steering engine drives the trolley to move. PYNQ-Z2 passes through wireless wifi and connects the Ali cloud server, and PC end receives PC end instruction, transmits the operation result to PC end promptly.
The connection mode of the ZYNQ circuit is designed as shown in FIG. 6. The core interface connection mode and the data circulation mode are as follows:
and the PS terminal controls the IP core and is responsible for finishing the preprocessing of the acquired image and the observation of the result. The PS terminal controls the IP core to transmit the configuration information to the AXI interconnection IP core 1 through the M _ AXI _ GP0 interface, outputs the data stream to be operated of the PS terminal to the AXI interconnection IP core 4 through the S _ AXI _ HP0 interface, and reads the operation result of the PL terminal from the AXI interconnection IP core 4. The PS end output data flow is transmitted to the AXI interconnection IP core 2 through the VDMA buffer, an M03_ AXI signal line output by the AXI interconnection IP core 1 is simultaneously connected to the AXI interconnection IP core 2 to determine a function IP to which the PS end output data flow is transmitted, and the AXI interconnection IP core 3 determines which function IP is transmitted back to the PS end.
Wherein, the function IP comprises an ORB-SLAM IP corresponding to a tumble rescue function; "SVM IP", corresponding to "gesture recognition function"; the CNN IP corresponds to the function of serving vegetables and feeding.
Advantageous effects
Compared with the existing aged nursing system, the multifunctional integrated system-on-chip for nursing the aged has the following beneficial effects that:
1. the SVM operation of gesture recognition in the system is distributed to FPGA logic resource operation, and the SVM kernel function is calculated in a parallelization manner by increasing the number of PE units, so that the speed is increased by multiple times compared with the gesture recognition operation of a PC end;
2. the candidate frame extraction in the system fully utilizes the characteristics of a target environment to be detected, the frame-shaped characteristics of the dish grids are used as candidate areas, grid division of Yolo, namely NMS non-maximum value inhibition, is replaced, and multiple promotion is achieved;
3. the CNN convolution, pooling and full-connection function module of the system can be repeatedly called according to the configuration information, the CNN operation of the PS end can be accelerated, the structure of the network can be randomly changed, the system can be suitable for different network structures, different parameter quantities and different function modes, and the accelerated network has the same flexibility with the network construction of the pitoch in use;
4. the binocular stereo ranging in the system has the advantages that the hardware is adopted to accelerate the updating speed to be faster, the depth of field of each pixel point of a photographed image can be obtained, and accordingly richer distance information can be detected compared with a traditional ultrasonic ranging module, the system is more suitable for tumble rescue, comprehensive road condition information is provided for a user by combining binocular stereo matching ranging and a tinier-yolo obstacle detection module, meanwhile, the real-time performance is good, and the practical use requirement is met.
Drawings
FIG. 1 is a diagram illustrating the relationship between modules of a multifunctional integrated system-on-chip for nursing aged people according to the present invention;
FIG. 2 is a connection diagram of a gesture recognition system module in the multifunctional integrated system-on-chip for nursing old people according to the present invention;
FIG. 3 is a connection diagram of a food clamping and feeding system module in the multifunctional integrated system-on-chip for nursing old people according to the invention;
FIG. 4 is a connection diagram of a fall detection system module in a multifunctional integrated system-on-chip for nursing old people according to the present invention;
FIG. 5 is a diagram illustrating a hardware connection method in a multifunctional integrated system-on-a-chip for nursing aged people according to the present invention;
FIG. 6 is a circuit connection diagram of a multifunctional integrated system-on-chip for nursing old people according to the present invention.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings and embodiments.
Example 1
This example illustrates the composition, connection, working process and advantages of the system of the present invention when implemented. The noun explanations used are shown in table 1 below:
TABLE 1 noun explanations in the multifunctional integrated system-on-a-chip for geriatric care
Figure BDA0002811137020000081
Figure BDA0002811137020000091
When the embedded system-on-chip is concretely implemented, the embedded system-on-chip is a PYNQ-Z2 development board; the external hardware module is an FPGA external hardware module; the modules are schematically composed as shown in FIG. 1, and the hardware connection relationship is shown in FIG. 5. The gesture recognition system is shown in FIG. 2; the support vector memory is implemented by SV ROMn in fig. 2, and the offset memory is implemented by Bias ROM in fig. 2.
The implementation process of gesture recognition comprises the following steps:
1) pictures are taken of 10 gestures from the number "1" - "10", and 20 gestures are taken for each gesture, and the pictures are taken for a total of 200 gestures.
2) And (3) performing rotation and translation processing on each of 200 pictures, wherein the rotation angle is a random number from 0 DEG to 360 DEG, the translation direction is determined to be any one direction of up, down, left and right through the random number, and the translation distance is 0-1000 cv2 curtain coordinate units. Each picture is processed for 4 times, and finally 1000 pictures are obtained to be used as an enhanced data set;
3) the enhanced data set is updated with 7: 3, wherein 700 are training data sets, and 300 are testing data sets;
4) carrying out skin color extraction, contour extraction and Fourier descriptor extraction on the training data set according to the S2 gesture recognition step in sequence, wherein when the step C) in S2 gesture recognition is implemented specifically, the convolution kernel is 5 x 5; the skin color extraction in the step D) is Otsu processing; step D) extracting skin color based on a threshold value, and specifically calling a cv2.threshold function in a cv library carried by the PYNQ; the edge extraction in the step G) is realized by a canny detection operator; the extraction of the picture outline in the step H) is realized by calling a cv2.findContours function; when W in the step K) is implemented specifically, 32 are adopted; step L) storing the modulus result into a list variable as a vector to be tested; n and Q in the step M) are 32 and 16 in specific implementation. Finally, 700 txt files are obtained, and each txt file stores 32 data which are Fourier descriptors corresponding to each training picture;
5) calling a sklern library of python, setting the classification mode of the SVM as ovr (one-vs-rest), carrying out SVM training on data of 700 txt files by using a grid search grid sweep parameter method, and storing a primary SVM training model with the highest accuracy (wherein the training model comprises a support vector value, an alpha coefficient value and a bias value predicted by the SVM);
6) reading the stored SVM training model, converting all parameters into fixed point numbers with bit width of 32 and decimal precision bit width of N, compiling pure python SVM prediction codes, carrying out SVM operation on the converted fixed point numbers and a test data set, and counting the accuracy under different N settings. Selecting N with the highest accuracy as the decimal bit width of data transmitted to FPGA operation;
7) the support vector value, the alpha coefficient value and the bias value which are converted into fixed point numbers with the bit width of 32 and the decimal precision bit width of N are stored into a ROM module of the FPGA in a coe form in advance;
8) compiling an SVM operation IP core by verilog;
during specific implementation, SVM operation of gesture recognition is distributed to FPGA logic resource operation, SVM kernel functions are calculated in a parallelized mode by increasing the number of PE units, and compared with gesture recognition operation of a PC end, the speed is increased by 10 times.
And comparing the values of a read pointer and a write pointer by an asynchronous clock processing unit in the SVM operation IP core under the condition that a read-write signal is controlled by the two clocks to synchronize the two clocks to the same clock domain. The synchronization operation is accomplished with a beat operation. Under the control of the slave clock, the reading pointer is printed for two beats and stored in the register, and then the reading pointer is compared with the value printed for two beats by the writing pointer, and the reading space and the writing capacity are judged.
The AXIS receiving unit of the AXIS transceiving management submodule stores the received data into the input RAM. This process is controlled by two quantities: one is the tvalid signal transmitted from the host port, and the other is the linear signal transmitted from the block count management module. the tread signal is connected to the s _ axis _ tread signal line to inform the VDMA whether it is ready to receive data. tvalid is 1, ready is 1, representing masterThe incoming data is valid and the calculation block is ready to receive data, and the read valid signal is set to 1 and passed to the block count block, and the address of the input RAM is set from 0 to NtypeChanging, and storing data into the RAM; if tvalid is 0 and tready is 1, the representation calculation module is ready and the data input by the master is invalid, at this time, the read valid signal is set to 0, and the address value input into the RAM is unchanged; otherwise the read valid signal and the address value input to the RAM are unchanged. N is a radical oftypeThe number of classification categories for the SVM.
In specific implementation, the AXIS sending unit located in the AXIS bus control module is responsible for sending the data of the output RAM to the PS side. When the value of the sending valid value send _ start sent by the block count management module is 1, the module judges that the PS terminal is ready to receive through the m _ axis _ linear signal line, and adds one to the value of the output address control line of the output RAM, otherwise, the value remains unchanged. When the value of the output address control line reaches the classification type NtypeWhen all data are read, the tlast signal line at the output end is pulled high and then pulled low, which represents the end of transmission. Compared with the traditional method that the AXIS communication interface in the AXIS bus control module directly calls the AXI interface command, the method enables users to write and calculate the IP core by verilog instead of C/C + + in the software and hardware collaborative design process, and improves the flexibility of circuit design.
In specific implementation, the read-empty-write-full signal control unit is provided with a read pointer and a write pointer; when FIFO data storage begins to operate, a write pointer is continuously increased by one to write data in; and the read pointer is continuously added with one to read the data. And the embedded asynchronous data cache submodule judges whether the data is written fully or read empty on each time-space rising edge. When the writing is full, the sending module can not transmit data inwards; when empty, the computing unit cannot read data from it. When the number of the write pointer and the read pointer is the same, the condition of reading empty or writing full is distinguished by one bit width of the register.
8. And completing circuit connection of the gesture recognition system in vivado software by using a blockdesign connection mode introduced by the modules.
9. Opening the PYNQ development board, and logging in a PS (packet switched) end control interface, namely a jupyter notebook programming interface, of the PYNQ development board on a computer;
10. loading tcl and bit files generated by the blockdesign into the PYNQ development board;
11. inserting a usb camera into the PYNQ, completing SVM prediction on the acquired gesture video stream according to the introduction of the invention content, and displaying the result on a jupyter notebook programming interface in real time.
Example 2
When the embedded system-on-chip is concretely implemented, the embedded system-on-chip is a PYNQ-Z2 development board; the external hardware module is an FPGA external hardware module, and the food clamping and feeding system is shown in figure 3. The implementation process of the food clamping and feeding system comprises the following steps:
(1) compiling python crawler codes, and reading 200 pictures of 4 types of dishes to be classified from the hectogram;
(2) and compiling a picture processing function, compressing the 4 types of dish pictures to be classified into 32 × 32 pictures, and generating a csv file. The csv file comprises a storage address and a label of each picture;
(3) importing data to be trained from a csv file by using a pyrrch official neural network programming template, defining a neural network according to the network definition method in the invention content, and calling a visdom function library to compile an accuracy observation program;
(4) inputting python-m visdom.server in the cmd command line, opening a network page displayed in the command line, and observing the accuracy visualization result of CNN training;
(5) modifying the neural network model according to the observation of the test accuracy until the accuracy is more than eighty-five percent, and saving the training model as an npy file;
(6) generating an IP core from a C + + written CNN calculation library through HLS software, and calling the IP core on a blockdesign plate of vivado to complete circuit connection;
(7) bit and tcl files generated by the circuit are loaded to the PYNQ, and dish prediction and feeding are completed according to the step S2 in the introduction of the invention.
And the voice recognition module completes voice recognition through a PyAudio function library. And the face recognition module completes mouth position recognition through the cv2 function library and the xml file of the mouth. The Yolo improved candidate frame extraction module sets the dishes containing region grids of the dishes into a rectangle, calls the opencv shape profile function to extract a quadrangle with the area size meeting a threshold value in a picture, detects a dish target candidate frame, fully utilizes the target environment characteristics to be detected during specific implementation, utilizes the frame type characteristics of the dish grids as a candidate region, replaces 98 grid divisions of Yolo, namely an NMS non-maximum value inhibition method part, and obtains the speed by 24 times. The AXI interface module utilizes the # pragma HLS INTERFACE AXIs instruction of HLS to complete interface data control. The convolution module, the pooling module and the full-connection module are accelerated by an instruction of # pragma HLS PIPELINE II being 1.
When the step Bb) in the S2 procedure of clipping and feeding vegetables is specifically implemented, the CNN convolutional layer weight file 'conv _ weight1.npy', 'conv _ weight2.npy', CNN convolutional layer bias file 'conv _ bias1.npy', 'conv _ bias2.npy', CNN fully-connected layer weight file 'fc _ weight2.npy', CNN fully-connected layer bias file 'fc _ bias2.npy', which is trained in the pytorch; and when the step Bf) is implemented specifically, calling a cv2.findContours function of opencv to obtain the number of straight edges, the area and the edge angle coordinate of each envelope surrounded by all edges in the video frame. When the number of the straight edges is 4 and the area value is larger than 5000, judging that the shape is a quadrangle, and storing the edge angle coordinates into a list variable in a numpy array form; when the step Bh) is implemented, calling a cv2.getPerspectivetransform function and a cv2.warpPerspective function of opencv, and reforming a quadrilateral picture which is framed by corner coordinates and is not transformed into a color picture and contains a dish into a regular rectangular picture; step Bi) is implemented, calling cv2.resize function of opencv to reform the rectangular picture into a 32 x 32 square picture; when the step Bj) is implemented, setting the input channel as 3, setting the output channel as 5, setting the size of the convolution kernel as 5 × 5, making the convolution step as1, and setting the edge enclosure of "0" as2 circles; step Bk) returning the convolution acceleration kernel of the PL end to the array of the PS end, and inputting the array into a Relu activation function and a pooling layer in the PL end, wherein the size of a sweep window of the pooling layer is 2 x 2, and the step length is 2; when the step Bl) is implemented, the input channel is arrangedSetting the number of output channels to be 5, setting the output channels to be 3, setting the size of a convolution kernel to be 5 x 5, setting convolution stepping to be 1, and setting the edge enclosure of '0' to be 2 circles; when the step Bm) is implemented, the size of a scanning window of the pooling layer is 2 x 2, and the step length is 2; when the step Bn) is implemented, the input dimension is 192, and the output dimension is the dish classification number 4; and Bo) specifically implementing, indexing the dish list by the array with the dimensionality of N returned by the PL terminal through an np.argmax function, and finding a classification result of the CNN dish corresponding to the input picture according to the numerical value of the function. Step D), when step G) is carried out, DmIs 4; and E) when the step F) is implemented, adopting a convolution kernel of 3 x 3.
The system is realized by distributing binocular stereo matching operation to the PL end, and the speed is increased by 2 times compared with the PS end.
Example 3
When the embedded system-on-chip is concretely implemented, the embedded system-on-chip is a PYNQ-Z2 development board; the external hardware module is an FPGA external hardware module; the tumble rescue system is implemented by distributing binocular stereo matching operation to a PL (personal digital assistant) end as shown in fig. 4, and 2 times of speed is increased compared with a PS (personal digital assistant) end. The implementation process of the tumble rescue system comprises the following steps:
a. the binocular camera is held by hands, 21 groups of left and right synchronous images are obtained by shooting at each angle of a calibration chessboard, the left and right cameras are subjected to three-dimensional calibration by utilizing a calibration tool box stereocamera calibration in MATLAB to obtain external parameters and internal parameters of the cameras, and the calibration error is controlled to be 0.18;
b. and correcting by using parameters obtained by calibrating the camera, so that the distortion caused by the re-projection of the left and right images is minimum, and the common area of the left and right views is maximum.
c. An obstacle detector is created, namely a neural network based on a tinier-yolo architecture is created, and the bit stream is downloaded to the device. The bitstream contains the color layers in the tinier-yolo that are input to hardware for acceleration. Initialization of other obstacle detection modules is executed in a Darknet framework, weights of a first layer and a last layer need to be loaded in advance, and a parse _ network _ cfg () function is used for analyzing a configuration file.
d. And acquiring an image for preprocessing. Images are collected in real time through the camera, and the letterbox _ image is adjusted into an input size (416 ), so that the dimension of the input image is ensured to meet the requirement. The first convolutional layer is executed in Python. And (5) performing convolution layer operation by using related functions in the utils class. The quantization layer is accelerated in HW. the color layer in the tinier-yolo network architecture is a core layer of the architecture, and the core layer is quantized in the training process and executed in programmable logic. HW accelerators consist of multiple layers of data stream implementations (convolutional layers + max pooling layers in a tier-yolo network). The last convolutional layer is executed in Python. And (5) drawing a detection result by using Darknet, and displaying the detection category and probability. The final detection result and the obstacle type are drawn on the original drawing by Darknet. While outputting the class and corresponding probability of detecting the obstacle. In specific implementation, the tiier-yolo network architecture quantizes input and output data and network weights, cuts the network weights, reduces the network scale, and combines the convolutional layer and the pooling layer; through the operation, the storage resources are reduced, meanwhile, the computing resources are reduced, the accuracy is guaranteed, meanwhile, the speed is improved, and meanwhile, the structure can fully exert the advantages of the FPGA.
e. When the trolley carrying the rescue articles (such as medicines and distress call devices) is placed in a place where the walking area of the old can be detected, the falling detection is carried out according to the steps of the invention content, and the step G) is specifically implemented, TaIs set to 0.5, TbIs set to 2. If the falling is detected, the old people can automatically go to the front of the old people to stop, and rescue is completed.
Example 4
In a specific implementation of the Bolck design circuit connection, the AXI interconnect IP core calls the "AXIInterconnect" IP core of vivado. The PS terminal control IP core calls a "ZYNQ 7 processing System" IP core of vivado. The CNN IP core, ORB-SLAM IP, is generated by HLS. The interior of the SVM IP core is a purely manual circuit which comprises a user-defined SVM classification operation IP core, a support vector memory, an input vector RAM memory unit, an output classification result RAM memory unit and an asynchronous FIFO buffer. The specific connection mode and data path of the pure manual circuit built in the SVMIP core are as follows: m _ AXI _ofZYNQ IP coreThe GP0 interface is connected with an S00_ AXI interface of an AXI interconnection IP core, an M00_ AXI interface of the AXI interconnection IP core is connected with an S _ AXI _ LITE interface of a VDMA controller IP core, and data to be operated are temporarily stored in the VDMA controller IP core; an m _ axis _ mm2s _ tdata interface of the VDMA controller IP core is connected to a user-defined SVM classification operation IP core, and a PS end data receiving interface of the user-defined SVM classification operation IP core is connected to a dina interface of the input vector RAM storage unit, so that the input vector is blocked and transmitted to the input vector RAM storage unit under the control of the user-defined SVM classification operation IP core. The dout interface of the input vector RAM storage unit is connected to the data _ from _ PS interface of the user-defined SVM classification operation IP core, and the user-defined SVM classification operation IP core repeatedly takes out N from the input vector RAM storage unitsv/NpeAnd performing SVM classification operation on the secondary input vector in FPGA logic. N of user-defined SVM classification operation IP corepeEach support vector data input interface is respectively connected to corresponding NpeAnd a douta interface of the support vector memory is used for taking out prestored support vector data and input vectors to complete SVM classification operation. A classification transportation result interface of the user-defined SVM classification operation IP core is connected to a dina interface of an RAM storage unit for outputting classification results, and N istypeN of classification SVM operationtypeThe final operation result is stored in a classification result RAM storage unit. The doutb interface of the classification result RAM storage unit is connected to the asynchronous FIFO buffer s00_ axis _ tdata interface, and all classification results in the RAM storage unit are stored in the asynchronous FIFO buffer. An M00_ AXIS interface of the asynchronous FIFO buffer is connected to an S _ AXIS _ S2MM interface of the VDMA controller IP core, an M _ AXI _ MM2S interface of the DMA controller IP core is connected to an S _ AXI _ HP0 interface of the PS-end control IP core, and the operation result is transmitted to the PS-end control IP core of the FPGA. N is a radical ofpeNumber of PE modules, when embodied, NPEThe value of (A) is 10; n is a radical ofsvTo support the number of vectors, NtypeThe number of classes is classified into multiple types of SVM.
Example 5
The implementation mode of the Aliyun module is as follows: and opening an official network of the Aliyun and registering the user to complete login. Then opening the Internet of things platform, entering a management console after successful opening, creating a product, then adding equipment, setting equipment triplet information, calculating a hash value,
and modifying the codes in the Arduino Demo, burning the modified codes into a development board, and finishing the related configuration work of the platform. The method comprises the steps of creating a first smart cabin project in an IoT (Internet of things) platform IoT Studio, carrying out equipment binding and debugging and service flow arrangement, then carrying out Web/App development, and debugging each stage by means of a debugging tool provided by the IoT Studio.
Example 6
When the user-defined SVM operation circuit is implemented, the PE module can be composed of a fixed point number multiplication submodule, a fixed point number addition submodule, a ROM address management submodule and an output data management submodule, and SVM linear kernel function operation is completed.
The fixed point number multiplication submodule is provided with a register type variable with the bit width of 2N and stores a multiplication result of a non-sign bit; intercepting [ N-2+ Q: Q ] of multiplication result of two N fixed point numbers with bit width of 2N]Bit-as-a-result [ N-2:0]Bit, taking the exclusive or of the most significant bit of the two multipliers as the N-1 bit of the result, namely the sign bit, and finishing fixed point number multiplication; the fixed point number addition submodule classifies the signs of the two addends into the same sign and the different sign for classification discussion, the sign bit of the result is determined under the condition of the same sign, and the absolute values of the two addends are added to form the absolute value of the result; under the condition of an abnormal sign, judging the absolute values of the two addends, determining the value of a sign bit according to the absolute values, then determining a subtracted number and a subtracted number, and taking the subtracted result as the absolute value of a final result; when inputting a valid data, the ROM address management submodule adds one to the data input count value, and when each clock rising edge comes, the ROM address management submodule stores the count value; before each calculation, judging whether the count value is the same as the count value at the previous moment, and calculating by using input data under the condition that the count value is changed; the output data management submodule has the function of carrying out address management, and a support vector is taken out when an effective test vector is input; when the input data is valid, the output of the fixed point number multiplication submodule is judged to be valid, the valid output result is placed into the input of the fixed point number addition submodule, and then the fixed point number addition submodule is usedThe output of the module is transmitted to the input, so that the dot product operation accumulation is realized; the output data management submodule is provided with a counter and calculates the number of input effective data; when inputting
Figure BDA0002811137020000131
Completing one dot product by data; at the moment, outputting an accumulation result, adding one to the accumulation effective signal, and simultaneously resetting the counter and the accumulation value; wherein the content of the first and second substances,
Figure BDA0002811137020000132
is the vector dimension to be processed.
Example 7
When the user-defined SVM operation circuit is implemented, the value of a read effective signal received by the input vector blocking transmission module and transmitted by the AXIS bus control module is 1, counting is carried out by using a transmission counter, and the count is carried out
Figure BDA0002811137020000144
When the transmission of the first test vector is finished, the blocking is started, and at the moment, the tread signal line is set to be 0; at the same time, the block count management module cyclically reads from the input RAM and transfers the read data to the PE module, and the process continues to count the transmission counter when the transmission counter is full
Figure BDA0002811137020000143
When the test vector and all the support vectors finish the operation, new data can be received, and the tread signal is set to be 1; when the value put into the output RAM reaches the classification type NtypeSetting the value of the sending effective signal to be 1, and allowing the AXIS transceiving management submodule to send data to the PS terminal. Wherein N issvIn order to support the number of vectors,
Figure BDA0002811137020000142
to test vector dimensions, NpeIs the number of PE modules. When the AXIS bus control module is implemented, in an input part, a tdata line of the VDMA is connected to an input data line of the input RAM, and a tvalid value of a main control port in the VDMA is judged at any time: if the value is 1, which represents that the input data is valid, the address signal is added by one, and the value is stored; connecting the output data line of the RAM storing the output result to the data line of the FIFO; complete SVM operation
Figure BDA0002811137020000141
The data sending module detects the ready signal of the receiving end at any time, if the ready signal is 1, the receiving end is ready, the value of the output address signal line is increased by one, and the data of the output RAM is transmitted to the output FIFO; when the value of the output address control signal line is increased to the value required to be transmitted, which represents that all results are read, the tlast signal line is pulled high and then pulled low, and the receiving end is informed that the transmission is finished. In specific implementation, the AXIS bus control module in the system writes an AXIS communication interface by using verilog and only operates tlast, tvalid and tlast signal lines, so that redundant configuration of all other signal lines of the AXIS bus is avoided, and the AXIS communication interface is more convenient and flexible to use than that provided by vivado.
Example 8
When the tumble rescue system is implemented, the size of the filter window of the image texture feature enhancement module is NaIf PFCThe value is such that the pre-processing result remains only at-PFC,PFC]And (4) range, namely, the brightness of the image is normalized by sliding the selected filter window on the whole image, so that the enhancement of the texture features of the image is completed. Wherein, PFCA cutoff value in the range of 1 to 31; n is a radical ofaIs an odd number in the range of 5-255. The SAD window traversal search module searches the right image using SAD window traversal to find the closest pixel relative to each pixel of the left image to achieve the best match. The starting point of the traversal search for controlling the matching search is set to 0. The SAD window starts with a user set parameter NdisAnd in the limited parallax search range, traversing and searching on the image to be matched according to the characteristic vector extracted by the window, carrying out similarity measurement on the window and the characteristics of each window in the traversing process, and taking the window with the maximum similarity degree as a final matching result. The high-quality matching point screening module comprises uniqueness detection, left-right consistency detection and connected region detection.Setting a judgment threshold T of a low-texture regionhdAnd if the sum of the absolute values of the derivatives of all the neighbor pixel points in the current SAD window is less than a specified threshold, the parallax value of the pixel point corresponding to the window is equal to the parallax value. Setting the parallax uniqueness percentage UratioFor uniqueness detection. Setting a disparity threshold DdiffFor left-right consistency detection, set SwsThe method is used for detecting the connected region, wherein the connected region is an image region formed by adjacent pixel points with the same pixel value. And detecting the change degree of the parallax in the window, and clearing the parallax in the window when the change degree of the parallax is greater than a threshold value. The parallax map distance measuring module obtains the parallax map by using a stereo matching method, and the parallax map passes through a projection matrix MaAnd obtaining a mapping chart with the same image size as the parallax chart, storing the values of the three-dimensional point coordinates of each pixel on the three axes of x, y and z of the pixel position in the camera coordinate system respectively, and taking the value of the z direction as the current distance.
Example 9
When the dish clamping and feeding system is implemented, the content of the AXI interface control module comprises:
1) this time, it is "preset data mode", also "CNN operation mode";
2) if the current time is a 'preset CNN parameter mode', the current time is transmitted with parameter information or test data information;
3) if the parameter information is transmitted at this time, what the parameter type is;
4) if this time it is the "CNN operation mode", the type of operation and user configurable information.
The operation types comprise convolution operation, pooling operation and full-connection operation.
The user-configurable information for the convolution operation is: the number of input channels, the number of output channels, the length of convolution kernel, the width of convolution kernel, the length of convolution step, the width of convolution step and the number of '0' edge surrounding circles;
the user configurable information for the pooling operation is: length of the pooling window, width of the pooling window, length of pooling step, width of pooling step, type of pooling (including maximum pooling and average pooling), and number of surrounding circles of '0' edge;
the user configurable information of the full join operation is: input dimension, output dimension.
The AXI-Stream bus transfers operation data information, and the content comprises the following contents:
1) receiving CNN test data;
2) receiving convolutional layer weights and offsets;
3) receiving full link layer weight and bias;
4) and sending the CNN test result.
Example 10
When the food clamping and feeding system is implemented, the contents of the storage management module comprise:
if the parameter information indicates that the 'preset data mode' is transmitted in the AXI-lite bus, the data in the AXI-Stream bus is put into a convolutional layer weight and offset storage unit or a full connection layer weight and offset storage unit according to the parameter type.
If test data information is transmitted in the AXI-lite bus to indicate the 'preset data mode', the data in the AXI-Stream bus is put into a test data storage unit.
If test data information is transmitted in the AXI-lite bus to indicate the 'preset data mode', the data in the AXI-Stream bus is put into a test data storage unit.
If the AXI-lite bus indicates that the CNN operation mode is used, enabling a corresponding module in the CNN acceleration library according to the operation type and the type parameter, and when the CNN acceleration library operation end mark information is valid, fetching data from the operation result storage unit through the AXI-Stream bus and transmitting the data to the dish classification module.
While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.

Claims (9)

1. The utility model provides a multi-functional integration SOC for old person nurses which characterized in that: the system comprises an embedded system on chip and an external hardware module; the embedded system on chip comprises a food clamping and feeding system, a gesture recognition system and a tumble rescue system; the external hardware module comprises a microphone, a trolley, an HDMI display, a mechanical arm, a PC end and a binocular camera;
the dish clamping and feeding system comprises a dish classification module, a face recognition module, a voice recognition module, an RAM buffer, a CNN acceleration library, an AXI interface control module and a storage management module; the dish classification module comprises a CNN parameter presetting sub-module, a candidate frame extracting sub-module, a CNN frame constructing sub-module and a mechanical arm pose calculating sub-module; the face recognition module comprises a mouth recognition sub-module and a mechanical arm pose control sub-module; the RAM buffer comprises an operation result storage unit, a test data storage unit, a convolution kernel and bias storage unit and a full-connection layer weight bias storage unit; the CNN acceleration library comprises a convolution acceleration module, a pooling acceleration module and a full-connection acceleration module;
the gesture recognition system comprises a result discrimination module, an SVM overlay scheduling module, a floating point number to fixed point number unit, a Fourier operator extraction module, an outline extraction module, a skin color detection module, an input vector blocking transmission module, a support vector memory, a bias memory, a pulse array structure, an input vector memory, an AXIS bus control module and a kernel function accumulation module; the systolic array structure comprises NPEA PE module; the PE module comprises a fixed point number multiplication submodule, a fixed point number addition submodule, a ROM address management submodule and an output data management submodule;
the tumble rescue system comprises a Gaussian filtering and graying module, a canny edge extraction module, a distance information generation module, a tumble detection module, an image information fusion display module, a video stream management module, an image texture feature enhancement module, an SAD window traversal search module, a high-quality matching point screening module, a disparity map distance measurement module, a tie-yolo obstacle detection module, a cv2pynq hardware acceleration module and a sobel filtering module;
the gesture recognition system adopts a ZYNQ development board to carry out software and hardware collaborative design to complete acceleration of the gesture recognition function; SVM test vector in gesture recognition systemThe AXIS bus control module and the input vector blocking transmission module are stored in the input vector memory, repeatedly taken out and transmitted to the N in the ripple array structurePEA first one of the PE modules; the PE module sequentially transmits the test vectors to the next PE module on the rising edge of the clock, takes out the support vectors from the support vector memory, performs dot product operation on the support vectors and the test vectors, and outputs the support vectors and the test vectors to the kernel function accumulation module; the kernel function accumulation module takes out the offset value from the offset memory and adds the offset value and the accumulated value to transmit the result to a result discrimination module of the gesture recognition system after counting and accumulating to a numerical value Nsvn through an internal counter; nsvn is the number of support vectors corresponding to each classification category in the gesture recognition SVM training;
the system comprises a camera, an outline extraction module, an SVM overlay scheduling module, a result judgment module and a PC terminal, wherein the camera is connected with a skin color detection module, the skin color detection module is connected with the outline extraction module, the outline extraction module is connected with the Fourier operator extraction module, the Fourier operator extraction module is connected with a floating point number to fixed point number unit, the SVM overlay scheduling module is connected with the result judgment module, and the result judgment module is connected with the PC terminal connected with Aliyun; the AXIS bus control module receives the test vectors transmitted by the SVM overlay scheduling module, and the input vector blocking transmission module is connected with the AXIS bus control module and the input vector memory and controls the writing time sequence of the test vectors to the input vector memory; the PE module in the systolic array structure is connected with the input vector blocking transmission module, the support vector memory and the kernel function accumulation module, and the test vector and the support vector carry out kernel function K (v) in a plurality of PE modulestest,vsup) Computing and then transforming the kernel function K (v)test,vsup) The operation result is transmitted to a kernel function accumulation module; the kernel function accumulation module is connected with the offset memory and the AXIS bus control module, takes the offset value from the offset memory and adds the offset value and the accumulated value to be used as a result and then transmitted to a result discrimination module of the gesture recognition system, K (v)test,vsup) For SVM kernel operation, vtestIs a test vector, vsupIs a support vector;
the pulse array structure is composed of NPEEach PE module corresponds to a support vector memory, and each support vector memory stores different storageA support vector; all PE modules are functionally consistent, including: receiving a test vector, transmitting the test vector, taking out a support vector from a corresponding support vector memory, completing SVM kernel function operation, and outputting a kernel function operation result of a single test vector and a single support vector;
the AXIS bus control module controls four core signal lines; the input vector blocking transmission module is used for judging whether the input vector memory finishes processing the data transmitted into the embedded asynchronous data cache sub-module by the SVM overlay scheduling module, if so, transmitting new data, and otherwise, stopping transmission;
a counter is arranged in the kernel function accumulation module, and the data received from the pulse array structure are counted and accumulated; when the number value Nsvn is counted, taking out the offset from the offset memory and adding and outputting the accumulation result;
the vegetable clamping and feeding system adopts a ZYNQ development board to carry out software and hardware collaborative design to complete the high-speed, accurate and full-automatic vegetable clamping and feeding function;
the dish clamping and feeding system identifies the name of a dish described by a user through a microphone through a voice identification method and judges whether the dish name is legal or not according to a dish database, the outline of each dish containing area in a picture acquired by a camera is identified to serve as a dish target candidate frame through a Yolo improved candidate frame extraction method, pictures in all the target candidate frames are transmitted to a CNN acceleration library one by one to complete CNN acceleration operation, the dish of the legal dish name of the user is clamped by a mechanical arm through a CNN classification result, the relative position of a mouth in a face in the picture of the camera is found through face identification, and feeding is completed by the mechanical arm;
the system comprises a voice recognition module, an AXI interface control module, a RAM (random access memory) buffer, a CNN preset data storage module, a CNN user configuration information storage module, a memory management module, a CNN acceleration library and a memory management module, wherein the voice recognition module is connected with the dish classification module and transmits dish information selected by a user, the AXI interface control module is connected with the dish classification module and reads the CNN preset data, the CNN user configuration information and test data, the memory management module is connected with the AXI interface control module and the RAM buffer, and the corresponding units in the RAM buffer for storing the preset data or the test data and the; the CNN acceleration library module is connected with the storage management unit and the RAM buffer, and enables the acceleration operation module to fetch data from the RAM buffer to complete operation according to the enabling information and transmit the result to the RAM buffer; the RAM buffer is connected with the AXI interface control module and the storage management module, and under the control of the storage management module, the operation result is transmitted to the AXI interface control module and then to the dish classification module;
the convolution acceleration module extracts data with the convolution kernel size defined by a user in parameter configuration from the parameter RAM according to parameter configuration information in the parameter RAM by a method of traversing scanning data, extracts window data of a corresponding address from the input data RAM to complete window scanning convolution, performs convolution acceleration by using a corresponding instruction in the HLS, and stores the result to an operation result storage unit;
the pooling acceleration module fetches data from a corresponding address of the input data RAM according to the definitions of the pooling mode, the size of a pooling window and the stepping data of the pooling window in the parameter configuration information to complete pooling, performs pooling acceleration by using a HLS corresponding instruction, and stores the result to an operation result storage unit;
the full-connection acceleration module extracts the weight and the offset value of a full-connection layer from the parameter RAM according to the input dimension and the output dimension in the parameter configuration information, completes full-connection multiply-add operation with input data of a corresponding address in the input data RAM, accelerates the multiply-add operation by utilizing an HLS corresponding instruction, and stores the result to an operation result storage unit;
the system comprises an AXI interface module, an AXI-lite bus, an AXI-Stream bus, a CNN parameter configuration information transmission module and a data processing module, wherein the AXI interface module controls the AXI-lite bus and the AXI-Stream bus, the AXI-lite bus transmits CNN parameter configuration information, and the AXI-Stream transmits mass data to be operated;
the storage management module completes control on the storage unit according to configuration information in the AXI-lite bus and operation ending mark information in the CNN acceleration library;
the tumble rescue system adopts a ZYNQ series development board to carry out software and hardware collaborative design to complete a quick and accurate tumble rescue function;
the tumble rescue system obtains a target position area by adopting an adjacent frame background difference method and a morphological algorithm for a video stream read by a camera, a tumble is detected according to the change of the outline of the target area, and a trolley carrying a rescue article automatically opens to a tumble person; a binocular camera at the head of the trolley receives the left view and the right view, the corresponding relation between the two images is sought through a stereo matching method, namely, corresponding points on the left image and the right image are matched to generate a disparity map, and then depth of field information is calculated in real time according to the disparity map; stopping the trolley when the trolley reaches a certain distance in front of the fallen person to complete rescue;
the video stream management module is connected with the sobel filtering module, the image texture feature enhancing module, the tie-yolo obstacle detecting module, the cv2pynq hardware accelerating module and the Gaussian filtering and graying module, and respectively transmits video data to the connecting blocks to perform corresponding functions; the system comprises a sobel filtering module, an SAD window traversal search module, a high-quality matching point screening module and a disparity map distance measuring module, wherein the sobel filtering module is connected with a canny edge extraction module to perform tumble judgment image preprocessing, the distance information generating module is connected with a tumble detection module to transmit distance information from a tumble person to a trolley, the image texture feature enhancing module, the SAD window traversal search module, the high-quality matching point screening module and the disparity map distance measuring module are sequentially connected in series to complete three-dimensional matching of left and right attempts of a binocular camera, obtain a disparity map and transmit the disparity map to the distance information generating module, the tie-yolo obstacle detection module is connected with an image;
the video stream management module is used for respectively transmitting the video streams to the sobel filtering module, the image texture feature enhancing module, the tie-yolo obstacle detecting module and the cv2pynq hardware accelerating module according to the configuration information;
the image texture feature enhancement module, the SAD window traversal search module, the high-quality matching point screening module and the disparity map distance measurement module are used for completing left and right view feature point matching together and providing a disparity map for image depth detection;
and the tinier-yolo module loads a pre-trained BNN model to detect the position and the type probability of the barrier, and the target area is determined by an NMS non-maximum value inhibition method.
2. The system-on-a-chip for senior care of claim 1, wherein: the PC end is connected to the Alice cloud server.
3. The system-on-a-chip for senior care of claim 1, wherein: n is a radical ofPEThe maximum value of (c) depends on the fan-in capability of the kernel function accumulation module.
4. The system-on-a-chip for senior care of claim 1, wherein: the range of Nsvn depends on the total number of training samples.
5. The system-on-a-chip for senior care of claim 1, wherein: nsvn comprises Nsv0 Nsv1 Nsv2.... Nsvn, the size of n depends on the strategy of SVM multi-classification and the number of classification categories.
6. The system-on-a-chip for senior care of claim 1, wherein: k (v)test,vsup) One of a gaussian kernel function and a polynomial kernel function is used.
7. The system-on-a-chip for senior care of claim 1, wherein: the core signal lines are tdata, tvalid, tready and tlast signal lines in the AXIS bus, respectively.
8. The system-on-a-chip for senior care of claim 1, wherein: and the video stream management module is a VDMA module.
9. The system-on-a-chip for senior care of claim 1, wherein: the cv2pynq module accelerates the opencv function, and the sobel filter module accelerates the sobel filter function.
CN202011391696.7A 2020-12-02 2020-12-02 Multifunctional integrated system-on-chip for nursing old people Pending CN112347034A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011391696.7A CN112347034A (en) 2020-12-02 2020-12-02 Multifunctional integrated system-on-chip for nursing old people

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011391696.7A CN112347034A (en) 2020-12-02 2020-12-02 Multifunctional integrated system-on-chip for nursing old people

Publications (1)

Publication Number Publication Date
CN112347034A true CN112347034A (en) 2021-02-09

Family

ID=74427319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011391696.7A Pending CN112347034A (en) 2020-12-02 2020-12-02 Multifunctional integrated system-on-chip for nursing old people

Country Status (1)

Country Link
CN (1) CN112347034A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107030691A (en) * 2017-03-24 2017-08-11 华为技术有限公司 A kind of data processing method and device for nursing robot
CN110110707A (en) * 2019-05-24 2019-08-09 苏州闪驰数控系统集成有限公司 Artificial intelligence CNN, LSTM neural network dynamic identifying system
CN110414305A (en) * 2019-04-23 2019-11-05 苏州闪驰数控系统集成有限公司 Artificial intelligence convolutional neural networks face identification system
CN110956111A (en) * 2019-11-22 2020-04-03 苏州闪驰数控系统集成有限公司 Artificial intelligence CNN, LSTM neural network gait recognition system
US20200151019A1 (en) * 2019-03-14 2020-05-14 Rednova Innovations,Inc. OPU-based CNN acceleration method and system
US20200285954A1 (en) * 2018-05-08 2020-09-10 Huazhong University Of Science And Technology Memory-based convolutional neural network system
CN111665934A (en) * 2020-04-30 2020-09-15 哈尔滨理工大学 Gesture recognition system and method based on ZYNQ software and hardware coprocessing
CN111709522A (en) * 2020-05-21 2020-09-25 哈尔滨工业大学 Deep learning target detection system based on server-embedded cooperation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107030691A (en) * 2017-03-24 2017-08-11 华为技术有限公司 A kind of data processing method and device for nursing robot
US20200285954A1 (en) * 2018-05-08 2020-09-10 Huazhong University Of Science And Technology Memory-based convolutional neural network system
US20200151019A1 (en) * 2019-03-14 2020-05-14 Rednova Innovations,Inc. OPU-based CNN acceleration method and system
CN110414305A (en) * 2019-04-23 2019-11-05 苏州闪驰数控系统集成有限公司 Artificial intelligence convolutional neural networks face identification system
CN110110707A (en) * 2019-05-24 2019-08-09 苏州闪驰数控系统集成有限公司 Artificial intelligence CNN, LSTM neural network dynamic identifying system
CN110956111A (en) * 2019-11-22 2020-04-03 苏州闪驰数控系统集成有限公司 Artificial intelligence CNN, LSTM neural network gait recognition system
CN111665934A (en) * 2020-04-30 2020-09-15 哈尔滨理工大学 Gesture recognition system and method based on ZYNQ software and hardware coprocessing
CN111709522A (en) * 2020-05-21 2020-09-25 哈尔滨工业大学 Deep learning target detection system based on server-embedded cooperation

Similar Documents

Publication Publication Date Title
Liu et al. Real-time robust vision-based hand gesture recognition using stereo images
TWI395145B (en) Hand gesture recognition system and method
CN110458805B (en) Plane detection method, computing device and circuit system
US8260040B2 (en) Generation of a disparity result with low latency
US9727775B2 (en) Method and system of curved object recognition using image matching for image processing
US8442269B2 (en) Method and apparatus for tracking target object
WO2022116423A1 (en) Object posture estimation method and apparatus, and electronic device and computer storage medium
US10242294B2 (en) Target object classification using three-dimensional geometric filtering
JP6554900B2 (en) Template creation apparatus and template creation method
WO2020134818A1 (en) Image processing method and related product
US9734435B2 (en) Recognition of hand poses by classification using discrete values
CN109447022A (en) A kind of lens type recognition methods and device
Tsiktsiris et al. Accelerated seven segment optical character recognition algorithm
Pohl et al. An efficient FPGA-based hardware framework for natural feature extraction and related computer vision tasks
CN110633630B (en) Behavior identification method and device and terminal equipment
CN115274099B (en) Human-intelligent interactive computer-aided diagnosis system and method
WO2023109086A1 (en) Character recognition method, apparatus and device, and storage medium
CN112347034A (en) Multifunctional integrated system-on-chip for nursing old people
EP4322053A1 (en) Information processing method and apparatus
Zhang Design and implementation of real-time high-definition stereo matching SoC on FPGA
Alhammami et al. Hardware/software co-design for accelerating human action recognition
US20230267671A1 (en) Apparatus and method for synchronization with virtual avatar, and system for synchronization with virtual avatar
CN112464753B (en) Method and device for detecting key points in image and terminal equipment
CN114419103A (en) Skeleton detection tracking method and device and electronic equipment
CN114092513A (en) Hand washing monitoring device for medical staff

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination