Past few weeks, due to changing apartment and other personal stuff I had to pause work my WnekoTank rover project.
I’m almost at the end of 2nd iteration, where first was just basic remotely controlled platform able only to be directly controlled like RC car, and now second being finished mostly on hardware side and having all building blocks needed for later autonomy:
- Ability to program series of movements, which can later be user by pathfinding algorithms to move over selected route,
- Having basic proximity sensors for emergency stopping on obstacles,
- Gimbal allowing view for operator around whole rover, also translating image clicks into camera or rover movement – later to be used in conjunction with pathfinding
- Base stereo camera for computer vision – to be used with stereoscopic 3d mapping of terrain in front for pathfinding and for obstacle detection
- Complete reorganization of all electronics onboard
Now I’m slowly working on 3rd iteration – which won’t provide much hardware changes, maybe some additional sensors and finally finishing at least most important parts of chassis to allow using it outdoors. But this will require many, many different software elements to be written and put together, this time consisting of many different technologies ranging from basic console app solutions, to Azure. This post is just roadmap, where I can systematize future steps and those already taken.
First element of computer vision will be two ESP-32 CAM modules, temporarily mounted on rover’s front, each with 60deg FOV. Each module has simple HTTP server on it and assigned different IP. They and Meadow main board will connect to onboard TP-Link WR902AC router imaged with OpenWRT, which I have already prepared and tested. That router in turn will connect to Huwawei LTE modem, which will provide internet connectivity. Router will provide site-to-site WireGuard VPN connection to my home network. Using that tunnel, when rover will need new 3d map data, it will send command to back-end server asking for updating map info.
Control application is designed with my Surface Pro in mind, which can be either connected directly to on-board router over WiFi, or using it’s point-to-site WireGuard client and any available internet connection connected to my home network.
Back-end server will be either physical/virtual server at my home, or virtual machine in Azure. In first case it will be directly connected to my LAN, in second – using WireGuard VPN (as it’s cheaper solution than using Azure VPN Gateway). In case of Azure it will be Spot Instance, running only when needed, which will allow me to cheaply use even very powerful servers. Current architecture of back-end service I’m building is based around multiple CPU calculations and AVX instructions – for that purpose on Azure with spot instances I can use even up 96 vCPU machine for less than 1 EUR per hour – and based on battery capacity rover won’t be used more than 1-2h at a time, and for testing I can still use on-prem much slower, but still fast server. Thanks to putting all elements within one LAN every part will be able to easily and securely talk to others no matter where on earth they are located, using just local IP addressing. That way changing on-prem and Azure-based servers for example will be just question of changing IP address stored in rover.
Back-end service is under construction now. Detailed work:
- Service receives command from rover,
- Using IP of both cameras from stereo pair it downloads images from them,
- Asynchronously sends those images for preview in Control App,
- Loads RGB data into either 3D Array containing R, G and B values for each pixel,
- Perform stereoscopic 3D reconstruction of scene, my current plan:
- Convolute image to get sum of neighboring pixels in each one:
- Perform FFT of each image,
- Multiply that data by FFT of kernels,
- Perform inversed FFT to get convoluted image
- Compare sections of convoluted images to find best fit and calculate depth from relative shift of elements between images
- Convolute image to get sum of neighboring pixels in each one:
- Sends obtained depth map do Control App for preview,
- Calculate 2 height maps from 3d data obtained above:
- Project each calculated point on place consisting of fixed size cells to find highest points in each cell,
- Check if points are not “hovering” which means there is space beneath them
- Distinguish between floor and ceiling points returning height maps of terrain and overhead obstacles, allowing to calculate if device will fit beneath them
- Send both height maps back to device
At the same time one of images might be sent also to Azure Cognitive Services Computer Vision for detecting objects. After receiving objects data, knowing their position on image and having depth map distance to object can be easily determined. This can also be used for tracking specific objects – but those are plans for 4th iteration.
I’m still not definitely decided on stereo vision algorithm, at the moment I decided that solution above should be quite fast and – most important – completely possible to be programmed by me.
Right now I finished programming FFT and IFFT methods, which are able to calculate 2048×2048 image in less than 250ms on 4 core 8 thread i7 7700K CPU. That should allow for latency of around 1s taking into account network latency (using above mentioner router and LTE modem pair I was getting ping times between 100 and 200ms between PC connected to local router and device connected to mobile router, also other calculations will take some time, although should be much less compute intensive than performing FFT). Running this on ~100vCore machine should also drastically lower those times, as computations are performed in parallel, dividing all data into parts equaling in number to CPU threads, thus allowing for heavy parallelization and minimizing time looses due to for example memory cache misses etc. as each CPU will have one, long strip of memory to use.
I already implemented first version of basic move-by-click functionality, where after clicking on gimbal camera image real-world point is calculated and device moves to it. It’s still buggy and under development, but it’s basis for later functionality, where instead of just moving straight to target rover will calculate best path.
Device will be keeping terrain map using two arrays – terrain and overhead obstacles. Each cell will be fixed size between 10 and 30cm, must decide after empirical testing. That way it will be able to keep information about big patch of terrain around it and use it for path finding. New data received from back-end service will be updating those maps, thus allowing rover to “learn” terrain as it moves around. It will be possible also to turn around, performing multiple scans to get 360deg map around, or, potentially cameras might be on rotating platform too – didn’t decided that yet.
Having terrain map will allow for multiple ways of control. Basic will be click control mentioned above. But also control app will allow just clicking on map to send device to further positions.
Having height maps – it ca be easily calculated which parts are traversable and which not. If height difference between two cells is to high – it is not traversable. If it’s traversable, but still high – it will higher difficulty score. If there is overhead obstacle too close to terrain – not traversable.
That way array containing difficulty values can be very easily obtained. And then for example A* algorithm can be used to find fastest way to selected position. After that it can be easily translated into series of move and turn commands. While moving additional scanning will be done updating map and filling blank areas that weren’t visible before for example, also checking if planned path isn’t blocked (maybe untraversable cell appeared in planned path?) or if there isn’t better one. In that case – stop device, calculate new route and continue moving.
Additionally there is a lot of parameters that can be set up in rover controlling it’s behavior – and there will only be more. From things like what to do if obstacle is detected – to small things like data sending frequency, or polling frequency for PID controllers etc.
Right now most important ones are controlled through Control App, less important are just hard coded, but changing them requires redeploying rover’s software from PC, which isn’t too flexible solution.
This is where I see big potential for Azure IoT Hub. Main board will be connected to hub, potentially together with other auxiliary devices in future, using just internet connection, without passing it through VPN to not add unnecessary latency (on-board VPN uses split tunneling, directing local traffic to my LAN, but public IPs just straight to internet, but IoT Hub communication is encrypted natively anyway) and there all, even smallest settings will be easily controllable using for example device twin.
I haven’t decided on details of that solution yet, but probably I’ll just create simple Settings webpage, or maybe just add Settings tab in Control App, that will be sending commands to IoT Hub changing settings in device twin, which in turn will change settings on rover.
This might not be lowest latency solution, but this will be only changing settings that are not needed in real time, on the other time allowing transparency (everything can be just easily checked even in JSON form in IoT Hub) and simplicity (JSON can be just modified without use of any web or app based GUI).
What’s next? I’m not sure yet! There are endless possibilities! By the end of 3rd iteration I want to have platform able to move in any terrain, that can be commanded only to get to some point and autonomously navigating to it, avoiding obstacles.
4th iteration? Probably expanding autonomous possibilities. Maybe, as I mentioned before, object tracking, additional cameras to map 360deg around at once, maybe interaction with specific object – for example if you see cat – don’t add it to static map! Or, on other hand – if you see cat – follow it?
It’s hard to decide now as it heavily depends on how 3rd iteration will work. So… we’ll see in few months!
And what’s next? 5th iteration is even more foggy right now – but I know one – I want robotic arm! If 4th iteration will add object awareness – it can be used for easier object interaction. Then additional camera and lidar in arm manipulator for easier grabbing… Oh, those are plans so far ahead better not to thing too much about them now!
One is for sure – there is a lot of work to be done, but even more fun and learning possibilities!
Oh boy, I’m working on this project already almost a year now and amount of work seems to only grow. Fortunately now it’s mostly programming work, safer for health (oh, I hate soldering fumes!) and easier to do anywhere without workshop and device connected.
My biggest issue was going around in blind – I was starting to work a bit on one part, then another etc. Now at least I have strict list of modules to produce – and I can just systematize my work, finishing each module and later connecting them together. And each of them can now be done separately and still fit within one, bigger, final project.
Is it most optimal way of doing things? Are algorithms I selected fastest, best performing? Probably not. But they are mine! And remembering that main goal of this whole project is learning – what’s better – using others’ people solutions, or building my own from scratch? I could just use OpenCV, I could just use FFTW etc. – but… where’s fun in that?