We have been imagining for quite some time now that robots will co-exist with humans one day. If this were to happen, robots will have to adapt and learn from humans and the world. Just as a child learns from elders and teachers, robots will learn from humans. For example, when a kid is about to touch a vessel filled with hot water, his/her mother instructs him “not to touch the vessel”. The kid receives the same instruction when he is about to touch a hot pressing iron. These are the same instructions given in different situations. The next time the kid sees something hot, he would not touch it. As humans, we easily identify that the similarity in the 2 situations is that the object was HOT and learn that we must not touch hot objects. Our Instruction framework enables robots to do the same, in other words it enables robots to learn about the world by following simple human instructions [2].

Endowing robots with such a learning mechanism is of immense advantage because it is not practical for us to program every single task/ functionality into a robot. Humans can simply give a few instructions based on which the robot learns the rules/concepts behind performing the task and can effectively reuse it to perform similar tasks [1].

But then, humans need to communicate with the robots in order to instruct them. There are various approaches to doing this – using Voice commands, Natural Language Processing, pressing buttons or using gestures. We use gestures since there are restrictions with the other methods that I would not be explaining here. Gestures are natural, even in our daily lives we use gestures to communicate with each other. We use gestures like pointing to objects, waving to people etc. In our framework, humans instruct robots using such gestures and the robots learn from these instructions as explained earlier. The robots use the Microsoft Kinect to recognise the gestures [3].

As long as the set of gestures are small, it is reasonable to expect a human to remember how the robot interprets each gesture. But as the gestures increase in number and get more complicated, the cognitive overload involved in a human remembering the entire set will be too much. In order to address this issue, our framework handles multiple interpretations of a gesture. For example, suppose person X needs something that he kept in his cupboard but has lost the cupboard key and is looking for it. Now his friend person Y, simply points towards a paper weight placed on the table. X could interpret this pointing gesture in 2 ways, (1) Use the paper weight to break the cupboard’s lock or (2) Look near the paper weight for the key. As you can see, interpreting an instruction is very important. Our framework enables the robot to similarly handle multiple interpretations of a single gesture made by a human. This allows the human to work with less number of gestures and at the same time communicate with the robot effectively. Once a robot receives a gestural instruction, it interprets it in several possible ways and chooses the one that it is most confident about. Although our current experiments have been performed on dual interpretations only, our framework can be extended to handle more than 2 interpretations also [3].

In addition to learning from human instructions, our framework enables robots to learn on their own. In other words, a robot may continue to learn even in the absence of instructions by performing exploratory actions. Based on the feedback it receives for an action the robot learns whether an action is profitable or not. Hence, even if the training period is incomplete, a robot can learn to solve a task in the best manner using the framework [1]. The framework can also handle wrong instructions although unlearning will take quite some time.

To sum up our framework, we facilitate humans to instruct robots naturally using gestures and facilitate robots to use them wisely and learn concepts governing our daily tasks. It may not be too far away in the future that you can sit on your couch and teach your very own personal robot to cook food by merely pointing to the stove, vegetables, spices etc. 🙂

We have tested our framework on a robot learning to solve a simple ball sorting task. The following is a video of that robot.

The robot reads human gestures using the Microsoft Kinect. This is a sensor that helps observe humans and their movement thus aiding in gesture recognition. The Microsoft Kinect (along with the OpenNI libraries) recognize humans and approximate them with a skeleton. The positions of the joints of the skeleton are constantly monitored (shoulders, neck, wrist, elbow, hip, knee etc). Using these positions we recognize gestures. A simple pointing gesture would mean the shoulder, elbow and wrist are roughly aligned in a straight line. By extending this straight line, we can identify the object being pointed at. Similarly several other gestures can be recognized.

The robots learn using the Reinforcement Learning paradigm combined with Markov Logic Networks. We used Robot Operating System (ROS) as the backbone for our framework’s implementation and the robot control. The arm was controlled using OpenRave and an Arduino board. Gestures were recognized using the Microsoft Kinect and OpenNI library.

The robot currently does not respond to audio, although our framework can be extended to do the same. This is one of the things that we are currently working on. All that the robot needs is instructions in some form that it can understand. We restrict our robot currently to gestural instructions.

A brief intro of the team :
The RISE Robotics Group, RISE Lab, Dept. of CSE, IITM
Guide : Dr. Ravindran Balaraman
Team : Pradyot KVN (IITM – MS), Manimaran S Sivamurugan (IITM – MS), Prahasaran Raja (IITM – Research Assistant), Anshul Bansal (Intern – Punjab Engg Coll), Abhishek Mehta (Intern – Punjab Engg Coll).


1) How does the robot read human actions and imitate them?

As explained earlier, our robot does NOT imitate humans. The robot reads human gestures and interprets them in a manner it feels appropriate.

2) In what ways will the robot be helpful to mankind? In what all activities can it be useful?

Our framework is not specific to a robot. It enables humans to teach any robot any task (that can be performed by the robot) using simple instructions.
(A robot’s capabilities are chiefly limited by its hardware that is the robot must have flying machinery in order for it to learn to fly.)

3) How long did it take you to work on this framework?

The formalization and current implementation took us 5 months starting from early November. Although, we have been thinking about it for a longer time.

4) Has the framework been successfully developed, or is it still in the testing period?

The framework is in its testing stage. There is still a lot of work that has to be done to achieve full utility. Although, we have tested it in its current state on a robot learning a simple sorting task.

5) Was this work presented at Stanford?

This work was not presented at Stanford. My trip to Stanford has nothing to do with the robot or the framework. I went there to attend the ASES 2011 Summit.


[1] Pradyot, K. V. N., Manimaran, S. S., and Ravindran, B. (2012) “Instructing a Reinforcement Learner”. To appear in the Proceedings of the Twenty Fifth Florida AI Research Society Conference (FLAIRS 2012). AAAI Press.
[2] Pradyot, K. V. N. and Ravindran, B. (2011) “Beyond Rewards: Learning with Richer Supervision”. In the Proceedings of the Ninth European Workshop on Reinforcement Learning (EWRL 2011).
[3] Robot Learning Aided by Gesture based Instructions currently under review for the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2012).


5 DoF Robot Arm control simulated in OpenRave. Trajectories are generated using Rapidly-Exploring Random Trees. Actual robot is controlled by communicating joint configurations through an Arduino board.

Objective was to teach a robot to sort objects into baskets of the same color using gesture based instructions.

  • Use of Markov Logic Networks in conjunction with a Reinforcement Learning framework to learn from instructions during the teaching phase.
  • Gestures sensed through Kinect using the OpenNI library.
  • Robot arm control using OpenRave and ROS. Plans to control the arm were generated using a RRT (Rapidly-Exploring Random Tree) based planner.
  • P3DX mobile base navigations implemented with Kinect as a sensor using the ROS Navigation stack.
  • Implemented the Reinforcement Learning framework by building over the ROS reinforcement_learning stack.
  • Kinect is used as the sensor for navigating the Pioneer P3DX robot.
  • ROS navigation stack used to implement localization and navigation.
  • ROSAria used to communicate with the P3DX.
  • Work done at RISE Lab, IIT Madras, India.

My final year project during my undergrad was a Robot Arm that played Tic Tac Toe. The arm was built using the aluminium supports found in a toy constructor kit. There was an onboard camera that we used to locate the tic-tac-toe board. The game board was made out of wood with buttons under each square. The idea was for a person to simply plug in the arm to the PC and start playing the game. The person does not have to carefully place the board in a predefined position. The arm starts off by scanning the table for the board, localizes itself with respect to the board and starts playing the game.  There are options such as “hard” and “easy” and the human can choose to play first or second. We used hobby servo motors (HiTech) and a Logitech webcam. The entire vision system was built using OpenCV Libraries. Notice that in the video we use a printed 5×5 chessboard to aid in the arm localisation.


BUSTED!! Would anyone name their first robot this?? Yes, such was my team which built a micromouse for the IITB Techfest… Our first week was entirely spent on burning ICs and spoiling motors… In no time, we had a huge pile of thing marked ‘BUSTED’… Thus we unanimously decided to name our mouse thus and its twin brother BUSTED V2.0…


Details of construction :

  • Aluminium chassis with wooden boards to hold circuits
  • 12V Stepper motors
  • L298 based motor driver circuit
  • LM324 based sensor input circuit
  • IR Led and Photodiode (placed inside a sketch pen cap)
  • 15V power supply (Ni Cd batteries)
  • AT89C52 microcontroller
  • RAM interfaced for storing the maze map


(from left) Immi, Gagan, ME, Sharad and Harry

I and our Test Board

The NAKED TWINS!! am talking abt the bots…

The base and batteries

Photodiode in a sketch pen cap (was supposed to cut out sunlight)


Sensor input circuit

the table after an average day’s work

working in the train

Too much work can do this to u too!!

Last set of trials at IITB

Our bots failed to make it past even the first turn… The same ones after some testing and renaming to “Maze Buster” and “Jerry” won at NIT Surat…