Over the past weekend I participated in my first ever hackathon here in Dusseldorf. The theme described on the event page was rather vague and didn't really have any clear direction so we weren't sure what exactly to expect. Before arriving we had decided that we wanted to do something interesting and/or fun with the Panama papers data that was released publicly.
Eventually we decided to create a simple website to 'Build Your Own Offshore Corporation', poking fun at the ridiculousness of the companies (if you can even call them that) mentioned in the papers. Using the actual data pulled from the papers we allow the user to generate a name, locate an intermediary to help set up their corporation, and decide which tax jurisdiction to have the company operate in. We used R Shiny to build a quick prototype. It isn't hosted online anywhere but if you clone our github repo you can run it locally.
The funniest part for us was seeing how silly the actual companies names are. A couple of our group member put together the graphic above, using the most common words found in the names. Give it a try to see for yourself!
At the conclusion of the event we ended up taking home the prize for 'Best Pitch'. Which was quite a surprise to us as it seemed that our project was probably not what the judges were really looking for. So all-in-all not a bad way to spend a couple days. It was definitely an interesting experience for my first hackathon.
Finally I'd like to thank everybody in our group that participated! Without them this would not have been quite so successful or fun.
Objective Gametypes and KDA
2016-06-05 15:04:45
Something that often gets said regarding objective gametypes (capture the flag, strongholds, etc.) is that KDA (kill-deaths-assists) does not matter as long as you're getting the objectives. So I wanted to dig through some of my recent Halo 5 matches to see if there's any truth to this statement. Using a sample of recent objective games that I completed, a logistic regression was performed using Kills, Assists, and Deaths on the dependent variable of winning or losing. Basically I want to know how much each of these variables influences the probability of winning or losing a game.
This was also a bit of an excuse to play around some more with the language Julia. I've been starting to use it more at work and am really enjoying what it offers. Although it's tough to compete with the whole R ecosystem, thanks to the RCall package it makes the transition pretty painless. Head over to Julia-Lang if you want to know more.
First the non-Julia stuff. Since I want Halo 5 data I'll be using RCall to interface with my R package to get some data. Below is a simple wrapper function to get the match data I want using my h5api package.
Next, just using a simple loop I'll grab data from 250 of my most recent matches. CLWakaLaka is my gamertag, so you can either use mine again or try your own if you play Halo 5.
match_data = DataFrame()
for i in 0:9
match_data = [
match_data;
getRecentMatches("CLWakaLaka", "arena", i*25, 25, api_key)
]
end
This performs a little cleaning up before performing the regression. Besides a win or a lose, it's possible for a match to end in a tie or disconnect. So first only definite win/lose matches are considered. Slayer games are also filtered out. Since kills are the objective in this game type it goes without saying that KDA directly influences your likelihood of winning. Finally the result is changed to a simple 1-0 variable. 1 meaning win and 0 meaning lose.
match_data = match_data[((match_data[:Result] .== 1) | (match_data[:Result] .== 3)) & (match_data[:id] .!= slayer_id), :]
match_data[:Result] = map(match_data[:Result]) do x
if x == 3
return 1
else
return 0
end
end
match_data[:Result] = convert(Array{Float64, 1}, match_data[:Result])
Finally, using the GLM package, a logistic regression is performed in the data with the following results:
So what does this tell us? Well the logistic function looks like this:
1 / (1 + exp(-( intercept + b1*x1 + b2*x2 + etc. )))
Where bi's are the estimates above and xi's are the data points. For example, if I had a game with 10 kills, 4 assists, and 8 deaths, then my estimated probability of winning that game would be:
1 / (1 + exp(-( 0.0405 -0.202*8 +0.202*4 + 0.122*10 ))) = 0.61 or 61%
From the estimates above: deaths negatively influence the probability of a win and kills and assists influence the probability of a win positively. All three variable estimates have a p-value of less than 0.05 which suggests they are significant factors in the overall outcome of a game (obviously). The intercept, however, is not significant which makes sense since we likely have no data for a 0/0/0 game.
Interestingly, deaths and assists coefficients have roughly equal magnitude while the coefficient for kills is slightly less. This would suggest that the relative importance of these actions corresponds to Deaths = Assists > Kills. Meaning that the most important factors towards getting the win are (in this order): not dying, always shooting/helping teammates, then getting kills.
So there we go, it seems there is a little kernel of truth in the idea that KDA in an objective-based gametype is not everything... Although it certainly helps, and feels so good.
A Brand New Blog
2016-05-12 19:33:36
What's the first thing you think when you learn about server-side-scripting and CGI for the first time? Obviously it's take your blog off of Blogger and rewrite your own platform... in C. At least that was my first thought. Luckily there are plenty of online resources to get started. This introduction is particularly helpful and definitely worth reading if you're at all interested in the subject.
Anyways, I've now parted ways with Blogger and have moved my personal blog to an Amazon EC2 server running Ubuntu with Apache2 and my own CGI scripts.
To create my blog I broke it down to some basic elements that I wanted to include: a page to view posts, an info/contact page, a page to add/edit/delete posts, and a database to store everything. For a database I went with SQLite because of how modular the database data is. Since it's all stored in a single file it's easy to copy/backup etc. Once that was set up, each page basically becomes a pretty front-end for database queries. And to make it pretty I decided to make use of Twitter's Bootstrap CSS styling because it takes very little effort to create an appealing looking layout that handles screen of all sizes. Which means that it looks nice on mobile and desktop screens.
I'm fairly happy with how everything turned out. Though there are some features missing that I would like to implement in the future such as a post timeline and a way to directly link to single posts. I think having direct links is important for sharing posts.
All the code has been uploaded to my Github. Since a lot of the HTML was hard-coded into the source (instead of dynamically loaded from files) it serves mostly as a reference for how to do this type of project.
Why Assault in Halo 5 Needs to Change
2016-03-22 12:01:10
Assault creates boring, drawn-out games where even the smallest mistake by either side heavily swings the match in the others favor. This is not an interesting game dynamic and at the very least assault needs to be tweaked or, in my opinion, removed entirely from Team Arena. This is just my anecdotal opinion based on my games so far so lets grab some data from 343i's API to see if this really is the case.
I gathered a random sample of 1236 CTF and Assault games (472 assault, 764 ctf) and analyzed their duration and victory conditions.
Game Duration:
At 12 minutes the standard game time runs out. If at 12 minutes one team has more points than the other then the game ends and the team with more points is the victor. If there is a tie when the timer runs then overtime begins and the same check is done at the end of overtime. From the chart above it's quite clear that Assault games are much more likely than CTF to either end due to the timer running out or in overtime. This is indicative of low-scoring games where neither team is able to reach the required 3 points to win.
Victory Conditions:
The red group in this chart indicates the percentage of games that ended due to a 3-cap (a team managing to score 3 points) before the initial timer runs out. The blue region shows the percentage of games that ended from a time-out (one team being ahead at 12 minutes) or a victory in overtime (one team being ahead after overtime or reaching 3 points in overtime). So nearly 25% more Assault games than CTF games end from the timer or in overtime.
I don't have a problem intrinsically with the fact that Assault often produces long games, if the objective was an interesting one. The singular objective in Assault of 1 ball and 1 cap location produces games where decision making means less than in other game types. Since there is no question about what you should be doing at any one time it takes away decisions from the player.
My other main issue is the variability. Under normal gameplay it's very difficult to cap the ball in Assault, as I have shown here with their abnormally long games. However, a single lucky kill (or unlucky death) can easily swing the game in the your favor (or your opponent favor). It's a mechanic that isn't satisfying because it feels like you're playing for that lucky kill instead of making smart decisions throughout the match.
Halo 5 February Season Ranks
2016-02-13 09:37:10
Inspired by a post on the Halo subreddit showing player rankings in different multiplayer playlists in Halo 5 I decided to do something similar. I used the Halo 5 API R package I wrote a few months ago got to work collecting as much data as I could from users.
The public API has several limiting factors. First is that there's no easy way to simply get a large list of users currently playing Halo. Microsoft doesn't make public player activity numbers public so having easy access to this would probably be against their intention. To get around this I looked at my recent game history, and took all the names of the players of my opponents. Then looked at all their recent game histories. Quickly I was able to get a list of a little over 17,000 unique names that played at least some form of matchmaking.
The second limiting factor is that Microsoft only allows 1 request per second. This isn't really a huge hurdle as I just left it running overnight to gather the data. It would be nicer if there was a faster option though.
Anyways here are some results from the Team Arena, Slayer, and SWAT playlists. Keep in mind that when this data was gathered it had only been about 10 days since the February season went live. I'll try to do something like this again at the end of the month to see if there are any major changes.
Team Arena:
From what I've heard of 343i's Halo ranking system a near normal curve like this is to be expected. With gold and platinum ranks containing the majority of players. There are some spikes at tier 1 of each rank. This is due to the fact that in Halo 5 you cannot rank down out of a division until the next season. So if players are improperly placed into diamond or go on a hot streak and make it into diamond, they'll just end up sitting at diamond 1 most of the season. With how quick the seasons in Halo are (historically slightly over a month for pre-season and January) I don't think this is that bad of an issue, as it does relieve a little bit of 'ladder anxiety' from the matchmaking knowing that you won't rank out of a division if you go on a losing streak.
Another interesting thing is the large percentage of players currently in onyx. I think this is likely due to my small-ish sample size of just over 7000 players for Team Arena, and because it's still early in the season. As the season goes on the more casual players will finish their placement matches and the relative amount of onyx players will probably decrease.
I've also included a boxplot showing win rates of players in the different ranks. Nothing really surprising; as players rank increases so does their win rate. Though from my understanding the matchmaking system should try and give players a close to 50% win rate, so again, perhaps this will even out as the season progresses.
Slayer:
Nothing too different in the slayer playlist other than it seems skewed towards higher ranks. I'm actually a little lost as to why this is. My best guess is that players who play Slayer might be more competitive than those that play Team Arena and combined with it being early in the season it's likely to produce these results. But again, I don't really know for sure. Maybe you have some suggestions?
SWAT:
Compared with the other two playlist there is hardly anybody in SWAT above platinum. From what I've heard 343i have done some tuning for ranks in SWAT which is probably why it looks so different than the others.
Well I hope you enjoyed these. Comments and suggestions on what other kinds of data from Halo 5 might be interesting are definitely welcome. What I really want to do is make some heatmaps (still) but 343i seems like they do not want to make the necessary data to do this available. Maybe soon... hopefully.
About
This blog is intended as written documentation of my ongoing side-projects. Most everything here is likely to be programming or data related, although something else may creep in every now and then.
This blog itself is an example of a personal project of mine. It was written entirely from scratch in C with SQLite3. It's also hosted on a Raspberry Pi.