Post

Robotics at a High Level

What are robots?

It’s a pretty broad term when you think about it. Wikipedia says that “A robot is a machine — especially one programmable by a computer — capable of carrying out a complex series of actions automatically”. Here automatically is meaning that at least part of the control is the responsibility of the machine, usually via algorithms implemented by the engineers who designed it. In other words, it make take some instructions from people via a controller or other means however at least some of the decisions are made by the robot. For example, quadcopter drones could be considered to be a kind of robot since even though humans usually use a wireless controller to give them instructions, the drone uses control algorithms designed by an engineer to ensure its stability automatically.

A quadrocopter drone A quadrocopter drone with a wireless controller

Of course, there are many examples of robots that perform there work without much control from humans. The robotic arms that are now ubiquitous in manufacturing industries spring to mind. They just go about there job, doing what they have been programmed to do with very minimal human interaction required.

Some robotic arms Some robotic arms used for manufacturing in a factory

And of course, no overview of robotics would be complete without mentioning Boston Dynamics and their backflipping robots.

What are the common problems robots must solve?

Well this can be quite dependent on the application but to perform high level tasks like self-driving cars or even something more mundane like a personal assistant bot that can make me a sandwhich there are three big problems to solve.

Firstly, where am I? Solving this requires what is known as localisation. In other words, the robot must have some representation of its environment and its location in it. OK, so now that the robot knows where it is now it needs to know where do I need to go? Of course we know that to make a sandwhich you should probably find your way to the kitchen and then find the fridge and get out the food you need e.t.c but this is all implicit when I ask my robot to make me a sandwhich. We know this because of experience and implicit meaning which in general is fairly difficult to directly program into a robot. Ontop of that, the robot now needs a concept of a kitchen and where in its map or representation of the environment the kitchen could be. Again, very difficult to come up with exact instructions for — we just know because of our experiences. Finally, even once our robot knows this, it must plan its path through the environment in order to reach the kitchen. It is almost certain that on its way, the robot will encounter static obstacles like tables or walls and maybe even dynamic obstacles like people or pets. When I ask my robot to make me a sandwhich I didn’t directly say it but it would be nice if it didn’t blast a hole in my wall to get there — or worse my cat. So it needs to do some path planning and likely make dynamic changes to it as it makes its way.

See, Think, Act

As just discussed, most of the time for robots performing high level tasks they must constantly be updating their plan as the environment around them changes. This is often known as the see, think, act cycle.

First the robot gathers raw data about its environment and itself using sensors such as cameras, ultransonic sensors, lidars, microphones, IMUs e.t.c. Already we have problems since the data collected by the robot will be invariably wrong. Sensors are inherently vulnerable to noise in the environment (information we don’t care about) but also in most cases robots are discrete systems that live in a continuous world. They are limited in the information that they can perceive in terms of how often (temporal discretisation) and also how accurate (quantisation) it is. For the noise problem, in many cases the frequency of the noise is much higher than the desired recordings and so a low pass digital filter can be used to help attenuate it but there is little that can be done about temporal discetisation and quantisation other than the limitation of your hardware and choose it appropriately for the task. Interpolation and other tricks are possible but in any case the amount of information is fundamentally limited by the sampling rate and bit precision.

Now, after obtaining this information about the world, the robot must use it to make decisions. Fundamentally, the data it has gathered is generally represented in very simple ways. Pixels are just values in a grid (or array) representing the amount of red, green and blue in a small area of the world. In order to gain some kind of useful understanding, the robot must use some kind of algorithms or models to interpret the relationships and patterns in these grids of numbers. Somehow we need to get from these grids of numbers to an understanding of where the kitchen is. This is certainly not a trivial problem and something that has been a significant struggle. In the last decade there has been impressive progress with the advances in deep learning. Instead of writing algorithms for computers to execute, we found if we wrote algorithms to help the robots learn the best algorithms to use by presenting it with lots of data this worked quite well. Pretty meta. Convolutional Neural Networks were able to learn these layers of abstraction naturally using learned kernels (see my blog post here for some more details on CNNs) and more recently, Vision Transformers (ViTs) have been able to achieve state-of-the-art performance on imaging tasks. At the same time, computational power has continued to grow with particular focus on parallelisation (performing multiple computations concurrently) which is particularly significant for imaging tasks and training these neural netowrks. Furthermore, once the robot has used these models to gain some model of its environment, other AI techniques such as reinforcement learning have proved to be effective in helping it to make decisions based on the information about its environment.

This leads to the final stage in the cycle which is to act in some way. In my case, hopefully the robot decides to start moving towards the kitchen and not into my cat and to do so it will probably turn some motors to move its wheels or legs or something like that. This in turn will affect the robots state and potentially the environment around it so the whole process can repeat as it begins gathering information about its environment again.

So a pretty high level overview in the first week of my METR4202 course. I’m pretty excited to learn more about the details of these parts in the upcoming weeks.

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.