logo资料库

Smart Vision for Managed Home Care.pdf

第1页 / 共5页
第2页 / 共5页
第3页 / 共5页
第4页 / 共5页
第5页 / 共5页
资料共5页,全文预览结束
Smart Vision for Managed Home Care Jackson Dean Goodwin Dept. of Computer Science University of Tennessee at Chattanooga Chattanooga, TN 37403 mwq755@mocs.utc.edu Abstract—With an increasing number of elderly living alone, it becomes a challenge to ensure their safety and quality of life while maintaining their independence. The aim of this research is to investigate a new software system for pervasive home monitoring using smart vision techniques. In-home activity monitoring can provide useful information to doctors in areas such as behavior profiling, and preventive care. It can also facilitate in emergency detection and remote assistance. The proposed solution uses an inexpensive webcam and a computer program to analyze posture using techniques such as foreground detection and ellipse fitting. This paper will describe the merits and challenges of using computer vision as a solution to monitoring and evaluating daily activities of the elderly. Index Terms—elderly care, activity monitoring, computer vision. I. INTRODUCTION As life expectancy continues to increase, so does the demand for elderly care. Most elders would prefer to live in their own homes, but the gradual loss of functioning ability often causes these individuals to require some sort of assisted living or long term care such as that found in a nursing home. Pervasive home monitoring could afford elders a certain level of security and improvement in quality of life while maintaining their independence and privacy. Low-cost vision-based systems can be used to monitor and evaluate daily activities of the occupants. The ultimate goal of this research is to develop methods for analyzing the motion of an individual in a scene in order to extract information about the person’s posture, behavior, and activity. Major research questions include how to separate the individual from the scene, how to detect posture, and how to ensure privacy. In this paper, we focus on methods for detecting foreground, detecting humans in the foreground, and detecting posture, so that we can monitor activity and behavior. II. MOTIVATIONS Individuals with poor postural stability who do not need assistance with activities of daily living can benefit the most from a low-cost vision-based home monitoring system. Such a system would allow these individuals to remain at home and continue regular activities without incurring expenses and loss of privacy or from hiring a healthcare professional to be in the home. independence The cost of providing such a monitoring system is less than the cost of hiring a caregiver or living in a nursing home. Costs for care range from $70 a day at an adult day health care center, to $200 a day for a semi-private room in a nursing home. Costs for an in-home pervasive monitoring system would be a one- time investment of about $500 for several wireless cameras (and possibly other devices beyond the scope of this project), and $1200 for a computer base/processing unit. Then there would be an additional cost of about $30 per day. The up-front cost for a monitoring system would be more, but it would match the price of adult day health care in 45 days (in 25 days for assisted living and in 12 days for a nursing home). Monitoring systems can also provide more privacy for the occupant for two reasons: 1) The occupant can continue to live alone without disturbance, and 2) Identifying features from the cameras can be removed programmatically. The computer does virtually all of the analysis, so the only time anyone would access images from the cameras would be in an emergency that required further examination of the situation. III. RELATED WORK Researchers at the Wireless Sensor Networks Lab at Stanford University have developed a system [1] for monitoring elderly persons remotely. They make use of a wireless badge containing accelerometers to detect a signal indicating a fall. Wireless signal strength is used to triangulate an approximate location to trigger cameras with the best views when a fall occurs. Each camera processes the scene independently to identify whether a human body exists in the image. The scene is first analyzed to detect motion events through the use of background subtraction and blob segmentation. Blobs are further classified as human or non- human by analyzing the percentage of straight edges and skin color in each blob. If the blob is a human, the posture and head position is estimated and a certainty level is reached by each camera based on the consistency of the obtained results. Posture is analyzed by fitting an ellipse on the blob and analyzing the orientations of the major and minor axes. The head is detected using skin color or shoulder-neck profile or both. Through collaborative reasoning among several cameras, a final decision is made about the state of the user. This work addresses the issue of reliability of results by making use of different camera views, reducing the number of false alarms and increasing efficiency.
Tarik Taleb et. al. present a framework [2] for assisting elders at home called ANGELAH, which is a middleware solution integrating elder monitoring, emergency detection, and networking. It enables efficient integration between a variety of sensors and actuators deployed at home for emergency detection and provides a solid framework for creating and managing rescue teams composed of individuals willing to promptly assist elders in case of emergency situations. This work discussed issues relating to elder support group formation. Researchers at the Imperial College London have developed a ubiquitous sensing system [3] for behavior profiling. This system uses vision based activity monitoring for activity recognition and fall detection. To circumvent privacy issues, their vision-based system filters captured images at the device level into blobs, which only encapsulate the shape outline and motion vectors of the subject. To analyze the activity of the subject, the position, gait, posture and movement of the blobs in the image sequences are tracked. This subject- specific information is called personal metrics. The basic goal of their system was to measure variables from individuals during their daily activities in order to capture deviations of gait, activity, and posture to facilitate timely intervention or provide automatic alert in emergency cases. The posture is estimated by fusing multiple cues, such as projection histogram and elliptical fitting, obtained from the blobs and comparing with reference patterns. From the posture estimation results, activity can be accurately determined. This system can also determine different types of walking gaits and decide whether the user has deviated from the usual walking pattern. The main issues covered in this work are ensuring privacy by filtering captured images at the device level, and using personal metrics to profile behavior. IV. PROPOSED APPROACH The focus of this research is on behavior profiling through the use of smart vision techniques. First, posture must be detected and then changes in the posture over time can be analyzed to detect events and evaluate activity. We wrote a computer program that detects posture in real time from streaming video. Much of the image processing was done using the algorithms already provided in the OpenCV library. The input for this program is provided by a camera, and the output is intended to be a description of the person's activity. The activity is in terms of how much time was spent standing up, walking, sitting down, lying down, etc. This system could also be used to alert caregivers to a fall. So far, the program only differentiates between three activities (standing, sitting, lying down) and fall detection is not implemented. Foreground Detection. The first step was to separate the foreground from the background. Generally, the foreground is everything that is either moving or new, and the background is everything else that does not move and does not change. Foreground is the part of the image that is of interest, since it is the only place where a human should appear. Figure 1 illustrates the concept of foreground and background. Fig. 1. A classroom serves as the background (left). The author becomes the foreground when he enters and stands in the room (center). The foreground is shown in white, and background is shown in black (right). Note that the shadow is also detected as foreground. There are several ways to detect foreground. One way is to simply subtract each frame of video from its immediate predecessor. This is called frame differencing. This method has its drawbacks as it only works for detecting objects that are currently in motion, and it generally only detects the outlines of objects. A more sophisticated way to detect foreground is to create a model of the background and compare images from each frame to the background model. This is the method that we have implemented. In order to create the background model, our program first goes through calibration. This process lasts about 15 seconds, and creates several background models, each corresponding to a different exposure level of the camera. This allows for the program to adjust to changing lighting conditions and to automatically choose the best exposure for the scene at any void calibrate() { convertToHSLColorSpace (the frame from the camera); convertToAFloatingPointScale (the image from the previous line); findTheSumOfTheAccumulatedFramesAnd (the current frame); make a backup of the background model; if (it is time to check the exposure) { detectForeground(); // see figure 3 detectSkin(); determinePercentageOfForegroundIn (the foreground image); if(the percentage of foreground in the image > the threshold) { Revert to the old background model; stop calibrating; } else { increase the exposure; if(we have gone through all of the { stop calibrating; } } } } exposures) moment. Fig. 2. Pseudocode for the calibration algorithm
Sometimes the background model becomes invalid. Such is the case when the lighting in a room changes and when the camera is moved. Other times, the background may be valid, but there is no detected foreground (i.e. no one is in the scene). In this case, it would be useful to recalibrate the scene in case there have been any changes to the background since the person left. For these reasons the program recalibrates when either: a) There is not much in the foreground (which implies that there is no one in the room), or b) there is too much in the foreground (which implies that the lighting in the room has changed, the camera has moved, or the background model is no longer accurate). Once we have a background model, we can detect foreground by finding the difference between the model and the current frame of video. This is done by finding the absolute value of the difference between the components of the model and the components of the current frame, and then applying a threshold. The threshold is used to make a binary image (i.e. one whose pixels are either white or black – foreground or not foreground) out of the grayscale image created from the subtraction of the frames. This process uses the same technique as the frame differencing method mentioned earlier, but differs from this method in that the current frame is compared to the background model instead of the immediately preceding frame. Once the foreground is detected, an algorithm is applied to the image to remove noise (i.e. isolated specks and unwanted small regions) and clean up the edges. This works by applying opening and closing morphological transformations (which use dilation and erosion operations) and by removing contours that are too small to be important. void detectForeground() { convertToHSLColorSpace (the frame from the camera); splitIntoIndividualImagesForEachChannel (the frame from the camera); splitIntoIndividualImagesForEachChannel (the background model); findTheAbsoluteValueOfTheDifferenceBetween (each channel of the frame and model); applyAThresholdTo (the differences of the frames); convertFromRGB2GRAY (the frame from the camera); combineSingleChannelImagesIntoOneImage (each of the channels from the frame); applyAThresholdTo (the image from the previous line); removeNoiseFromAndSmooth (the image from the previous line); } Fig. 3. Pseudocode for the foreground detection algorithm applying a threshold to each of the channels in the RGB color space tends to detect shadows in addition to detecting real foreground. Using the HSL color space reduces this effect to a certain degree, but does not completely eliminate it (see Figure 1 (right) and note). Regardless of the color space used for foreground detection, one challenge is dealing with the fact that when a part of the foreground is similar in color to the corresponding part of the background model, it is less likely to be detected as foreground. This can be seen as black lines or shapes appearing where there should be white foreground. Human Detection. The next step is to determine whether or not the foreground “blobs” (i.e. connected regions in the foreground) are human contours. Human blob detection can be done by combining the results of detecting the head, the percentage of skin color, and the percentage of straight lines in the blob. Each of these factors can be weighed to determine whether the blob is a human. The easiest technique to implement is skin color detection. This can easily be done using the HSL color space by applying two hue thresholds, an upper and a lower, to define a range of hues that corresponds to the color of skin. (Hue is a term that refers to what color something is e.g. red, orange, yellow, green, cyan, blue, and corresponds to the frequency, or inversely the wavelength, of light that it appears to be. This is in contrast to saturation, which means closeness to a pure color versus a shade of gray, and lightness, which is the perceived intensity, or brightness of the color.) If a blob has a large enough percentage of skin color (from the original image), then it has a greater chance of being a human. void detectSkin() { for(each pixel in the frame from the camera) { convertToHSLColorSpace (the frame from the camera); if (the color is within the range of hues and saturations for skin color) mark the pixel as skin; } } Fig. 4. Pseudocode for the skin detection algorithm The next technique we implemented was the straight line detection. The idea is that manmade objects tend to have straight-line components, while humans do not. Application of a threshold to the number of lines detected in the part of the original image that corresponds to the blob can be used to determine whether the blob is likely to be a human. The Hough line transform is used to detect lines in the blobs. Also, worthy of noting is the fact that we use the HSL (Hue, Saturation, Lightness) color space for the foreground detection. The reason for this has to do with the difference between how computers and humans see color and intensity of light. As it applies to foreground detection, the RGB (Red, Green, Blue) color space is very sensitive to changes in brightness. Therefore,
void detectLines() { detect edges in the frame from the camera; detect lines that lie within the foreground region with the Hough line transform; if(there are fewer than x lines in the foreground) increase the probability that the blob is a human; } Fig. 5. Pseudocode for the line detection algorithm The last technique for human detection is head detection. This is the most complex and probably the most decisive factor in detecting a human body. There are several ways to detect a head. One way involves skin color and the position of the head relative to the body, another involves the shape profile of the head, neck, and shoulders, and another involves complex analysis of facial features. The technique that we implemented involved fitting an ellipse around the blob to find the ends of the body (i.e. the ends of the major axis of the ellipse), and then calculating the percentage of skin color in a small region around each end. This relatively crude solution detects the head most of the time when the blob is a human, but can fail when the person is not facing the camera. Figure 7 depicts a couple of the steps involved in human detection. void detectHead() { detect the contour of each blob; fit an ellipse around the contour; find the ends of the major axis; detect the percentage of skin color in a small region around each end; if(that percentage > x) a head has been detected; } Fig. 6. Pseudocode for the head detection algorithm Fig. 7. Skin color detected in the blob (left). Ellipse fitted around blob to locate the head (right). Posture Detection. Next, posture can be determined from the lengths and orientation of the axes of the ellipse fitted around the blob. The algorithm that we implemented to do this is very simple. If the ellipse is taller than it is wide, the person is either standing or sitting. If the ratio of the longer axis to the smaller axis is below a certain threshold, then the person is sitting. Otherwise, the person is lying down. [1] A. Keshavarz, A. M. Tabar, and H. Aghajan, "Distributed vision-based reasoning for smart home care," 2006. V. RESULTS The foreground detection will still detect shadows, especially if one stands near a wall or some other vertical surface. The detection of shadows will usually cause the program to report that the user is sitting down when he is really standing up near a wall. Also, extending the arms out to the side will change the shape of the ellipse being fitted around the body, and will cause the program to report that the person the background that are about the same color as the object in the same position in the real foreground will not show up in the detected foreground, because the difference in color does not exceed the threshold. The human detection algorithm is rudimentary and does not always filter out non-human objects. is sitting down. Occasionally, objects in VI. FUTURE WORK Much more can be done in the way of human detection, especially with the head detection. A shoulder-neck profile algorithm or a facial feature detection algorithm would yield much more accurate results, but would also be more difficult to implement. Optimization of the HSL color space or use of another color space may produce more accurate results as well. A more robust foreground detection algorithm could better handle slight changes in lighting, such as that caused by changes in ambient light as from the sun on a partly cloudy day. Foreground detection could be improved again by not allowing shadows to appear in the foreground. Finally, more work can be done in the areas of fall detection, activity recognition, behavior profiling, and gait analysis to make use of the posture information. More can be done to have a quantitative analysis of the results as well. Most of the data can best be interpreted through visualization. A confusion matrix can be used to visualize the performance of this algorithm. VII. CONCLUSION The three steps that must be performed before activity can be analyzed are 1) foreground detection, 2) human detection, and 3) posture detection. Each of these steps can be taken in various ways. We spent most of our time working on foreground detection. As a result, this is the most developed part of our program. The other steps are no less important, and can be developed in future work. ACKNOWLEDGMENT We would like to thank our professor, Dr. Yu Cao, for providing the opportunity for us to do this research and for answering our questions and guiding us along the way. REFERENCES [2] T. Taleb, D. Bottazzi, M. Guizani, and H. Nait- Charif, "ANGELAH: a framework for assisting
elders at home," Selected Areas in Communications, IEEE Journal on, vol. 27, pp. 480-494, 2009. B. P. L. Lo, J. L. Wang, and G. Z. Yang, "From imaging networks to behavior profiling: Ubiquitous sensing for managed homecare of the elderly," 2005. [3]
分享到:
收藏