Mockingbird
A downloadable game for Windows
There is currently a bug training your AI will crash the game after some time. The training will complete and save, but the game will freeze. You can safely restart the game and select the third option to use the newly trained model.
Ultimately, I believe that the machine learning technique I used is too data hungry for one person to train it in a reasonable time - the example I built on top of has a simpler case of driving a car. It uses over 1 hour's worth of driving to train.
Flight Controls
W: Thrust
A-D: Roll
Move mouse: Pitch & Yaw
Esc: Save and return to main menu
W (after crashing): Respawn
Description
Mockingbird is a toy that lets you train AI pilots. There is no goal here other than to experiment and have fun.
The AI pilots need to watch you fly around the island so they can learn from you. You'll need to do a few laps of the island so that they have the data they need.
Once you've recorded some flights, it's time for the pilots to hit the books. You can watch as they train and learn to fly like you.
Once you're happy with the AI pilots' skills (or can't stand to watch them anymore) then you can join them in flight. Try to avoid any midair collisions!
How to Play
First, click the "Record a Flight" option. This will spawn you in a plane and start recording your movements. Use WASD and the mouse to fly through the level. Remember that rolling and pitching is much more effective than yawing. If you do crash, no worries! Press W to respawn immediately.
We'll need a lot of data to train the AI. Fly around the island as much as you like.
Flights are automatically recorded - Once you have finished flying, press "esc" to return to the main menu, then select "Train AI on Recorded Flights". The AIs will get to work, slowly studying your recorded flights. Press "esc" at any time to return to the main menu, training saves frequently . Training takes a while, so feel free to leave it on in the background.
Unfortunately there's a bug where live reloading of the training doesn't work, so the planes don't you'll have to leave it for 10 minutes without seeing anything interesting. Set a stopwatch on your phone and go get a drink of water ;)
Finally, you can join the AI pilots in flight by selecting the "Fly with AI" option. This is just for fun! Your flights are not recorded and the AI does not train during this mode. You need to have trained them before entering this mode.
"Failed to run script cli.py" error
This is a bug with the game. No progress will be lost.
This might mean that files are missing from previous steps. If you have not recorded any flights then you will not be able to train the AI, and if you have not trained the AI then you will not be able to fly with your AI friends.
Restarting the Game
Sharing this game with a friend? Or just want a clean slate because you have recorded crashing a few times too many?
In your file explorer, copy and paste this line:
%APPDATA%/../LocalLow/Carbide Function/
then delete the Mockingbird folder.
Special Thanks
Sam Devlin - shared a lot of machine learning info throughout the project. He pointed me toward Behavioural Cloning and provided many other tidbits of wisdom. This would not have been possible without him!
Tony Dong - made the awesome title art (you don't want to see my version!)
James Lockett, Josh P I - tested the Unity/Python solution, created the first babby planes
Other Credits
plane model: https://www.kenney.nl/assets/space-kit
landscape models: https://assetstore.unity.com/packages/3d/props/polygon-starter-pack-low-poly-3d-...
water: https://assetstore.unity.com/packages/2d/textures-materials/water/simple-water-s...
explosion animation: https://assetstore.unity.com/packages/vfx/particles/particle-pack-127325
Machine Learning
I used behavioural cloning to train the agents on the player's actions in different circumstances, live on the player's computer.
The agent's observation space contains 8 percepts. The 8 percepts are flattened to 25 features before they enter the model. These are all chosen by guesswork, I didn't have time to experiment with which features were useful and which were ignored.
- world position of the plane
- local compass to the centre of the world (respects rotation of the plane)
- distance from the plane to the centre of the world
- plane's local velocity
- plane's local angular velocity
- plane's world vertical velocity
- plane's 3D forward vector
- plane's 3D up vector
Distance vectors are linearly scaled down so they fit in a -1 to 1 range to avoid creating wildly varying weights across the model, which would have caused the regularizer to damage the rest of the model (although I never had time to install any regularizer terms! But it's good practice nonetheless). If the plane strays outside of the intended play area then the inputs can get larger, but at that point something's gone wrong already.
I used vectors for direction instead of rotation angles for the compass and the plane's orientation. This was done to help the agent learn - rotation angles have a large step in them between 359° and 0°, so agents must spend neurons and training time to learn that these should be treated similarly. Vectors at both 359° and 0° yaw have high Z components, so the values are already similar as they enter the model. This increases the number of features but since they're "more intuitive" to the model I believe this is a net win.
I used a discrete action distribution to capture the plane's inputs. There are 54 actions the plane can take, made up of 4 orthogonal actions with 2 or 3 choices each:
- Firing thrusters: off, on
- Pitch: -, 0, +
- Roll: -, 0, +
- Yaw: -, 0, +
I chose to use a discrete action space instead of continuous because there are many cases in this game where turning one direction or the other is good, but going straight is not (e.g. flying directly at a wall). This is known as multimodality. Discrete and orthogonally-expanded action spaces capture multimodality as two or more high-weight actions. In a continuous action space you'd need to encourage your model to learn chaotic and coincident monomodal distributions (or handle it in some way I haven't even heard of).
I used a one-hot action array to train the model. Inference produces an 54-width array of values of 0-1. The agent uses a modified softmax function to convert these values to an array of probabilities where all values are positive and their total adds up to 1.
Traditionally, softmax is calculated as:
x is in set X
softmax(x) = (e^x) / sum_y_over_all_X(e^y)
I replace e (Euler's constant) with a configurable "obedience" value. I make this higher than e to increase the contrast between the model's preferred and disapproved actions (aka being more exploitative), while keeping some randomness between similarly preferred actions. Human-input pitch and yaw are continuous e.g. can take values like -0.35, the agent can only output exactly -1, 0, or +1. The randomness allows the agent to dither when cornering and act more like the human expert.
Engineering
I used the player's CPU to train rather than the GPU because this would require the player to have an Nvidia GPU and install CUDA toolkit, which would put many players off. This makes training much slower, but everyone can play the game.
I used Python for the machine learning because it has many machine learning libraries and has a large ML community. A friend encouraged me to use behavioural cloning as it was the simplest imitation learning approach, so I started with Alex Staravoitau's behavioural cloning repository. I ended up rewriting all the Python to match my needs. His repository did have a great set of instructions and it all worked first time - his setup used Keras and TensorFlow, so I used that too.
I used PyInstaller to ship a self-contained Python package with the game. This captures all the packages I need to do live training on player's computers, and doesn't rely on the user's computer being set up a certain way - it doesn't matter if you have installed Python or not, nor what packages you have downloaded. I had to use specific versions of some libraries as PyInstaller sometimes requires special consideration from package authors. In particular, I had to use Pip's TensorFlow, which doesn't have as many optimisations enabled as Anaconda's TF or one built locally. There might have been a way but I was happy enough with the performance and needed to do other work.
I ran Python through C#'s Process class. Starting the process caused 1-2 second hangs (possibly virus checker?), so I moved all process management to a class that started and ran the process from a new thread. I shifted data and errors between the threads but kept it as simple as possible to avoid multithreading issues.
Status | Prototype |
Platforms | Windows |
Author | fuseinabowl |
Genre | Action |
Download
Click download now to get access to the following files:
Leave a comment
Log in with itch.io to leave a comment.