9.1. Further Work
Although this thesis presents a promising approach to social robotics, some further development is needed before the system can be used in a real- life setting. The main focus of future development should be to acquire a larger database with different environments and users, as suggested in Section 8.3.1.
A node should also be created to register movement, for example using ROS interface. Such a node was partly implemented using the “rosnode” interface and C#’s ability to load libraries, but due to limited time it was not prioritized for this thesis.
Optimization of the relabeling process could also be improved, and a de- tailed description of two ideas for such implementations is presented in Appendix B.
In addition, one could imagine a more direct approach for the feature fil- ters. Instead of just comparing certain features of a user with expected results, one could match a user to a template. However, due to the flex- ibility of the human body and the vast complexity of different poses, it might become very difficult and time-consuming to create templates that would work in every possible scenario.
[1] Adafruit. The Open Kinect project – THE OK PRIZE. Website, 2010. http://adafruit.com/blog/2010/11/04/the-open-kinect- project-the-ok-prize-get-1000-bounty-for-kinect-for-xbox- 360-open-source-drivers/ Aquired: 13:49 08.04.2011.
[2] Adafruit. Open Kinect driver(s) released. Website, 2010.
http://adafruit.com/blog/2010/11/10/we-have-a-winner- open-kinect-drivers-released-winner-will-use-3k-for-more- hacking-plus-an-additional-2k-goes-to-the-eff/ Aquired: 15:21
08.04.2011.
[3] amazon.com. Kinect Sensor with Kinect Adventures! http://www. amazon.com/Kinect-Sensor-Adventures-Xbox-360/dp/B002BSA298/, 2010. Aquired: 19:55 05.04.2011.
[4] Jean-Yves Bouguet. Pyramidal implementation of the lucas kanade feature tracker description of the algorithm, 2000. URL http: //robots.stanford.edu/cs223b04/algo_tracking.pdf.
[5] S.Y. Chen, Y.F. Li, and Jianwei Zhang. Vision Processing for Real- time 3-D Data Acquisition Based on Coded Structured Light. Image
Processing, IEEE Transactions on, 17(2):167 –176, February 2008.
[6] R. Cutler and M. Turk. View-based interpretation of real-time optical flow for gesture recognition. In Automatic Face and Gesture Recog-
nition, 1998. Proceedings. Third IEEE International Conference on,
pages 416 –421, April 1998.
[7] Emgu CV. Emgu CV: OpenCV in .NET (C#, VB, C++ and more). Website, 2011. http://emgu.comAquired: 16:00 06.05.2011.
References
[8] J. Davis and M. Shah. Visual gesture recognition. Vision, Image and
Signal Processing, IEE Proceedings -, 141(2):101 –106, April 1994.
[9] Jr. Forney, G.D. The viterbi algorithm. Proceedings of the IEEE, 61 (3):268 – 278, march 1973. ISSN 0018-9219.
[10] GameStop. Kinect for Xbox 360. http://www.gamestop.com/xbox- 360/accessories/kinect-for-xbox-360-with-kinect-adventures/ 90774, 2010. Aquired: 09:49 11.04.2011.
[11] S. Goto and F. Yamasaki. Integration of percussion robots ”robot- music” with the data-suit ”bodysuit”: Technological aspects and con- cepts. In Robot and Human interactive Communication, 2007. RO-
MAN 2007. The 16th IEEE International Symposium on, pages 775
–779, August 2007.
[12] M. Hasanuzzaman, V. Ampornaramveth, Tao Zhang, M.A. Bhuiyan, Y. Shirai, and H. Ueno. Real-time vision-based gesture recognition for human robot interaction. In Robotics and Biomimetics, 2004.
ROBIO 2004. IEEE International Conference on, pages 413 –418,
August 2004.
[13] En Wei Huang and Li Chen Fu. Gesture stroke recognition using computer vision and linear accelerometer. In Automatic Face Gesture
Recognition, 2008. FG ’08. 8th IEEE International Conference on,
pages 1 –6, September 2008.
[14] Thomas S. Huang and Vladimir I. Pavlovic. Hand gesture modeling, analysis, and synthesis. In In Proc. of IEEE International Workshop
on Automatic Face and Gesture Recognition, pages 73–79, 1995.
[15] IFixIt. http://www.ifixit.com/Guide/Image/meta/ dcGosZx6dEwevBXt, 2010. Aquired: 22:50 05.04.2011.
[16] Sacha Krakowiak. What is Middleware. Website, 2003. http:// middleware.objectweb.org/Aquired: 13:50 06.05.2011.
[17] J.J. Kuch and T.S. Huang. Vision based hand modeling and track- ing for virtual teleconferencing and telecollaboration. In Computer
Vision, 1995. Proceedings., Fifth International Conference on, pages
666 –671, June 1995.
[18] Logitech. Logitech webcam c200. http://www.logitech.com/en- us/webcam-communications/webcams/devices/5865, February 2011.
Aquired: 12:50 08.02.2011.
[19] Logitech. Logitech hd pro webcam c910. http://www.logitech.com/ en-us/webcam-communications/webcams/devices/6816, February
2011. Aquired: 12:49 08.02.2011.
[20] Jani M¨antyj¨arvi, Juha Kela, Panu Korpip¨a¨a, and Sanna Kallio. En- abling fast and effortless customisation in accelerometer based gesture interaction. In Proceedings of the 3rd international conference on Mo-
bile and ubiquitous multimedia, MUM ’04, pages 25–31, New York,
NY, USA, 2004. ACM.
[21] S. Mitra and T. Acharya. Gesture recognition: A survey. Systems,
Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 37(3):311 –324, May 2007.
[22] K. Morrow, C. Docan, G. Burdea, and A. Merians. Low-cost vir- tual rehabilitation of the hand for patients post-stroke. In Virtual
Rehabilitation, 2006 International Workshop on, pages 6 –10, April
2006.
[23] National Association of the Deaf. What is american sign lan- guage? http://www.nad.org/issues/american-sign-language/ what-is-asl, February 2011. Aquired: 22:52 08.02.2011.
[24] Kenji Oka, Yoichi Sato, and Hideki Koike. Real-time fingertip track- ing and gesture recognition. IEEE Computer Graphics and Applica-
tions, 22:64–71, 2002. ISSN 0272-1716.
[25] ”OpenCv”. Motion analysis and object tracking. Web- site, 2010. http://opencv.willowgarage.com/documentation/ cpp/motion_analysis_and_object_tracking.html Aquired: 19:54
References
[26] OpenKinect. Hardware Information. Website, 2011. http:// openkinect.org/wiki/Hardware_infoAquired: 19:15 13.04.2011. [27] OpenKinect. the OpenKinect project. Website, 2011. http://www.
openkinect.org Aquired: 11:32 13.06.2011.
[28] OpenNI. Introducing OpenNI. Website, 2011. http://www.openni. orgAquired: 15:24 13.04.2011.
[29] Play.com. ”kinect including kinect: Adventures!”. Web- site, 2011. http://www.play.com/Games/Xbox360/4- /10296372/Kinect-Including-Kinect-Adventures-/Product.html# TechnicalDetailsTab Aquired: 12:55 19.05.2011.
[30] Vitruvius Pollio. The Ten Books on Architecture - Book III, chapter 1. Project Gutenberg, 2006.
[31] PrimeSense. Primesensor reference design. http://www.primesense. com/?p=514, February 2011. Aquired: 22:00 08.02.2011.
[32] D.L. Quam. Gesture recognition with a dataglove. In Aerospace and
Electronics Conference, 1990. NAECON 1990., Proceedings of the IEEE 1990 National, pages 755 –760 vol.2, May 1990.
[33] Guinness World Records. Fastest-selling consumer electron- ics device. Website, 2011. http://www.guinnessworldrecords. com/Search/Details/Fastest-selling-consumer-electronics- device/74941.htm Aquired: 13:37 08.04.2011.
[34] Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern
Approach. Pearson Education, second edition edition, 2003.
[35] Daniel Scharstein and Richard Szeliski. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. Interna-
tional Journal of Computer Vision, 47:7–42, 2002. ISSN 0920-5691.
[36] Jianbo Shi and C. Tomasi. Good features to track. In Computer
Vision and Pattern Recognition, 1994. Proceedings CVPR ’94., 1994 IEEE Computer Society Conference on, pages 593 –600, June 1994.
[37] Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake. Real- Time Human pose Recognition in Parts from Single Depth Images. In
Computer Vision and Pattern Recognition. Microsoft Research Cam-
bridge & Xbox Incubation, IEEE, June (to appear) 2011.
[38] Tomoichi Takahashi and Fumio Kishino. Hand gesture coding based on experiments using a hand gesture interface device. SIGCHI Bull., 23:67–74, March 1991.
[39] K.N. Tarchanidis and J.N. Lygouras. Data glove with a force sensor.
Instrumentation and Measurement, IEEE Transactions on, 52(3):984
– 989, June 2003.
[40] Carnegie Mellon University. ”cmu graphics lab motion capture database”. Website, 2002. http://mocap.cs.cmu.edu/Aquired: 13:19
19.05.2011.
[41] Lloyd R. Welch. Hidden markov models and the baum-welch algo- rithm. IEEE Information Theory Society Newsletter, 53(4), Decem- ber 2003.
[42] Ying Wu and Thomas Huang. Vision-Based Gesture Recognition: A Review. In Annelies Braffort, Rachid Gherbi, Sylvie Gibet, Daniel Teil, and James Richardson, editors, Gesture-Based Communication
in Human-Computer Interaction, volume 1739 of Lecture Notes in Computer Science, pages 103–115. Springer Berlin / Heidelberg, 1999.
[43] Øystein Skotheim. Kinect sensor - preliminary study, May 2011. [44] X. Zabulis, H. Baltzakis, and A. Argyros. Vision-based Hand Gesture
A few measures have been taken to improve the performance of the system, some of these are mentioned below:
• Using ROI in images
• Moving conditions outside loops and duplicating code • Extracting limits outside loops
By using the ROI in images, only certain parts of the image are processed by both OpenCV algorithms and the implemented algorithms for this thesis. This is done by registering the leftmost, rightmost, topmost and bottommost points of the object of interest at the first iteration of the image, and results in improved execution time.
Some conditions have a high cost of calculating, and some places in the code the test for the condition is moved outside the loop. In terms of pseudo-code this would result in:
for each row: for each column
if Test(): //Do something else:
//Do something else being changed to:
if Test(): for each row:
A. Optimizations
//Do something else:
for each row: for each column
//Do something else
This results in Test() being executed once instead of number of rows times number of column times.
When profiling the algorithm, using the open source SlimTune profiler, it was shown that an alarming amount of time was used on a call to the OpenNI wrapper. A very high percentage of the total execution time was spent on getting the height of the depth image. Because of this, the limits in loops were moved outside the loop in the following way:
for (x=0; x < image.getWidth(); x++) for (y=0; y < image.getHeight(); y++)
//Do something was changed to:
imageHeight=image.getHeight() imageWidth=image.getWidth() for (x=0; x < imageWidth; x++)
for (y=0; y < imageHeight; y++) //Do something
The reason for the inefficiency is most likely that the calls to the wrapper (such as image.Height), demands exclusive access to the wrapper. Due to this, the image must acquire a lock width times height for the getHeight() call, and width times for the getWidth() call. With the optimized code, each call is made once per loop.
To improve the relabeling algorithm the following two measures can be taken:
• Using for example maximum flow to map optimally
• Using other features than the centroid, such as shape or area.
Figure B.1.: The circles marked as 1 and 2 are centroids of the current users, and the blue and red circles are the previously detected red and blue labels.
Consider the scenario in Figure B.1. With the current implementation, the figure marked as 1 will be labeled as blue, as it is closest to the blue circle, while the figure marked as 2 will be labeled as a random color (chosen by OpenNI). An example of this scenario would be if the Kinect was turned quickly to the the left. The optimal result could then be found using a max-flow algorithm, by adding a source super-node with edges to each of the current (numbered) centroids, and a sink super-node with edges to all of the previous (colored) centroids.
However, if instead this scenario was the result of the Kinect turning slightly to the right and a new object was detected, the correct mapping would be to map the blue label to object 1 and a new label to 2. To be
B. Relabeling Improvement
able to distinguish between these cases, one could implement features to recognize if the object is previously detected. Features could for example consist of area, density and circumference, and certain criteria could be set for a match to be accepted.
When considering such a complex approach, one should keep in mind that the algorithm is executed in every frame, and that all information about users passes through the relabeling process. Because all later steps depend on the results from the relabeling, the process can easily become a bottleneck that slows down the total execution speed of the system.