For over a year now, I have been using CyanogenMod in my Nexus phone. At first I just installed some bundle that brought all the proprietary Google applications and libraries, but later I decided that I wanted more control over it, so I did a clean install with no proprietary stuff.
This was not so great at the beginning, because the base system lacks the geolocation helpers that allow you to get a position in seconds (using GSM antennas and Wifi APs). And so, every time I opened OsmAnd (a great maps application, free software and free maps), I would have to wait minutes for the GPS radio to locate enough satellites.
Shortly after, I found about the UnifiedNLP project that provided a drop-in replacement for the location services, using pluggable location providers. This worked great, and you could choose to use the Apple or Mozilla on-line providers, or off-line databases that you could download or build yourself.
This worked well for most of my needs, and I was happy about it. I also had F-droid for installing free software applications, DAVdroid for contacts and calendar synchronisation, K9 for reading email, etc. I still needed some proprietary apps, but most of the stuff in my phone was Free Software.
But sadly, more and more application developers are buying into the vendor lock-in that is Google Play Services, which is a set of APIs that offer some useful functionality (including the location services), but that require non-free software that is not part of the AOSP (Android Open-Source Project). Mostly, this is because they make push notifications a requirement, or because they want you to be able to buy stuff from the application.
This is not limited to proprietary software. Most notably, the Signal project refuses to work without these libraries, or even to distribute the pre-compiled binaries on any platform that is not Google Play! (This is one of many reason why I don't recommend Signal to anybody).
And of course, many very useful services that people use every day require you to install proprietary applications, which care much less about your choices of non-standard platforms.
For the most part, I had been able to just get the package files for these applications1 from somewhere, and have the functionality I wanted.
Some apps would just work perfectly, others would complain about the lack of Play Services, but offer a degraded experience. You would not get notifications unless you re-opened the application, stuff like that. But they worked. Lately, some of the closed-source apps I sometimes use stopped working altogether.
So, tired of all this, I decided to give the MicroG project a try.
MicroG is a direct descendant of UnifiedNLP and the NOGAPPS project. I had known about it for a while, but the installation procedures always put me off.
LWN published an article about them recently, and so I decided to spend a good chunk of time making it work. This blog post is to help making this a bit less painful.
- No Gapps installed.
- Rooted phone (at least for the mechanism I describe here).
- Working ADB with root access to the phone.
- UnifiedNLP needs to be uninstalled (MicroG carries its own version of it).
The main problem with the installation, is that MicroG needs to pretend to be the original Google bundle. It has to show the same name, but most importantly, it has to spoof its cryptographic signatures so other applications do not realise it is not the real thing.
OmniROM and MarshROM (other alternative firmwares for Android) provide support for signature spoofing out of the box. If you are running these, go ahead and install MicroG, it will be very easy! Sadly, the CyanogenMod refused allowing signature spoofing, citing security concerns2, so most users will have to go the hard way.
The options for enabling this feature are basically two: either patch some core libraries H4xx0r style, or use the "Xposed framework". Since I still don't really understand what this Xposed thing is, and it is one of these projects that distributes files on XDA forums, I decided to go the dirty way.
Patching the ROM
Note: this method requires a rooted phone, java, and adb.
The MicroG wiki links to three different projects for patching your ROM, but turns out that two of them would not work at all with CyanogenMod 11 and 13 (the two versions I tried), because the system is "odexed" (whatever that means, the Android ecosystem is really annoying).
I actually upgraded CM to version 13 just to try this, so a couple of hours went there, with no results.
The project that did work is Haystack by Lanchon, which seems to be the cleaner and better developed of the three. Also the one with the most complex documentation.
In a nutshell, you need to download a bunch of files from the phone, apply a series of patches to it, and then reupload it.
To obtain the files and place them in the
maguro (the codename for my phone) directory:
$ ./pull-fileset maguro
Now you need to apply some patches with the
patch-fileset script. The patches are located in the
$ ls patches sigspoof-core sigspoof-hook-1.5-2.3 sigspoof-hook-4.0 sigspoof-hook-4.1-6.0 sigspoof-hook-7.0 sigspoof-ui-global-4.1-6.0 sigspoof-ui-global-7.0
patch-fileset script takes these parameters:
patch-fileset <patch-dir> <api-level> <input-dir> [ <output-dir> ]
Note that this requires knowing your Android version, and the API level. If you don't specify an output directory, it will append the patch name to the input directory name. Note that you should also check the output of the script for any warnings or errors!
First, you need to apply the patch that hooks into your specific version of Android (6.0, API level 23 in my case):
$ ./patch-fileset patches/sigspoof-hook-4.1-6.0 23 maguro >>> target directory: maguro__sigspoof-hook-4.1-6.0 [...] *** patch-fileset: success
Now, you need to add the core spoofing module:
$ ./patch-fileset patches/sigspoof-core 23 maguro__sigspoof-hook-4.1-6.0 >>> target directory: maguro__sigspoof-hook-4.1-6.0__sigspoof-core [...] *** patch-fileset: success
And finally, add the UI elements that let you enable or disable the signature spoofing:
$ ./patch-fileset patches/sigspoof-ui-global-4.1-6.0 23 maguro__sigspoof-hook-4.1-6.0__sigspoof-core >>> target directory: maguro__sigspoof-hook-4.1-6.0__sigspoof-core__sigspoof-ui-global-4.1-6.0 [...] *** patch-fileset: success
Now, you have a bundle ready to upload to your phone, and you do that with the
$ ./push-fileset maguro__sigspoof-hook-4.1-6.0__sigspoof-core__sigspoof-ui-global-4.1-6.0
After this, reboot your phone, go to settings / developer settings, and at the end you should find a checkbx for "Allow signature spoofing" which you should now enable.
Now that the difficult part is done, the rest of the installation is pretty easy. You can add the MicroG repository to F-Droid and install the rest of the modules from there. Check the installation guide for all the details.
Once all the parts are in place, and after a last reboot, you should find a MicroG settings icon that will check that everything is working correctly, and give you the choice of which components to enable.
So far, other applications believe this phone has nothing weird, I get quick GPS fixes, push notifications seem to work... Not bad at all for such a young project!
Hope this is useful. I would love to hear your feedback in the comments!
Which is a pretty horrible thing, having to download from fishy sites because Google refuses to let you use the marketplace without their proprietary application. I used to use a chrome extension to trick Google Play into believing your phone was downloading it, and so you could get the APK file, but now that I have no devices running stock Android, that does not work any more. ↩
Actually, I am still a bit concerned about this, because it is not completely clear to me how protected this is against malicious actors (I would love to get feedback on this!). ↩
Many people reading this have already suffered me talking to them about Prometheus. In personal conversation, or in the talks I gave at DebConf15 in Heidelberg, the Debian SunCamp in Lloret de Mar, BRMlab in Prague, and even at a talk on a different topic at the RABS in Cluj-Napoca.
Since their public announcement, I have been trying to support the project in the ways I could: by packaging it for Debian, and by spreading the word.
Last week the first ever Prometheus conference took place, so this time I did the opposite thing and I spoke about Debian to Prometheus people: I gave a 5 minutes lightning talk on Debian support for Prometheus.
What blew me away was the response I've got: from this tiny non-talk I prepared in 30 minutes, many people stopped me later to thank me for packaging Prometheus, and for Debian in general. They told me they were using my packages, gave me feedback, and even some RFPs!
At the post-conference beers, I had quite a few interesting discussions about Debian, release processes, library versioning, and culture clashes with upstream. I was expecting some Debian-bashing (we are old-fashioned, slow, etc.), instead I had intelligent and enriching conversations.
To me, this enforces once again my support and commitment to community conferences, where nobody is a VIP and everybody has something to share. It also taught me the value of intersecting different groups, even when there seems to be little in common.
Many people might not be aware of it, but since a couple of years ago, we have an excellent tool for tracking and recognising contributors to the Debian Project: Debian Contributors
Debian is a big project, and there are many people working that do not have great visibility, specially if they are not DDs or DMs. We are all volunteers, so it is very important that everybody gets credited for their work. No matter how small or unimportant they might think their work is, we need to recognise it!
One great feature of the system is that anybody can sign up to provide a new data source. If you have a way to create a list of people that is helping in your project, you can give them credit!
If you open the Contributors main page, you will get a list of all the groups with recent activity, and the people credited for their work. The data sources page gives information about each data source and who administers it.
For example, my Contributors page shows the many ways in which the system recognises me, all the way back to 2004! That includes commits to different projects, bug reports, and package uploads.
I have been maintaining a few of the data sources that track commits to Git and Subversion repositories:
- The Go packaging group (added just a couple of weeks ago).
- The Perl packaging group.
The last two are a bit problematic, as they group together all commits to the respective VCS repositories without distinguishing to which sub-projects the contributions were made.
The Go and Perl groups' contributions are already extracted from that big pile of data, but it would be much nicer if each substantial packaging team had their own data source. Sadly, my time is limited, so this is were you come into the picture!
If you are a member of a team, and want to help with this effort, adopt a new data source. You can be providing commit logs, but it is not limited to that; think of translators, event volunteers, BSP attendants, etc.
Do you fancy a hack-camp in a place like this?
As you might have heard by now, Ana (Guerrero) and I are organising a small Debian event this spring: the Debian SunCamp 2016.
It is going to be different to most other Debian events. Instead of a busy schedule of talks, SunCamp will focus on the hacking and socialising aspect.
We have tried to make this event the simplest event possible, both for organisers and attendees. There will be no schedule, except for the meal times at the hotel. But these can be ignored too, there is a lovely bar that serves snacks all day long, and plenty of restaurants and cafés around the village.
One of the things that makes the event simple, is that we have negotiated a flat price for accommodation that includes usage of all the facilities in the hotel, and optionally food. We will give you a booking code, and then you arrange your accommodation as you please, you can even stay longer if you feel like it!
The rooms are simple but pretty, and everything has been renovated very recently.
We are not preparing a talks programme, but we will provide the space and resources for talks if you feel inclined to prepare one.
You will have a huge meeting room, divided in 4 areas to reduce noise, where you can hack, have team discussions, or present talks.
Of course, some people will prefer to move their discussions to the outdoor area.
Or just skip the discussion, and have a good time with your Debian friends, playing pétanque, pool, air hockey, arcades, or board games.
Do you want to see more pictures? Check the full gallery
Debian SunCamp 2016
Hotel Anabel, LLoret de Mar, Province of Girona, Catalonia, Spain
May 26-29, 2016
Tempted already? Head to the wikipage and register now, it is only 7 weeks away!
Please reserve your room before the end of April. The hotel has reserved a number of rooms for us until that time. You can reserve a room after April, but we can't guarantee the hotel will still have free rooms.
If you take the Sarfa bus, they look like this:
If arriving at BCN terminal 2 (most low-cost carriers do), there are coach parkings between the B and C buildings. At the far end in the picture below, you can see the buses parked in both sides of the street. Sarfa is going to be picking up people from the right side (the same side as the terminal building B).
You have to get down at the "Lloret de Mar" bus station.
From there, you walk less than ten minutes, until you reach the hotel. There is a street that goes below it, and the entrance is to the left:
When you enter, the lobby is very bright, with views of the garden and swimming pool.
The rooms are simple, but comfortable. They have been recently renovated.
There are many amenities in the hotel, specially in the court yard, around the two outdoor swimming pools (there is also an indoors swimming pool in the spa area).
There are plenty of lounge chairs for sunbathing, or just chilling by the pool.
There is also a café/snack bar, with an outdoor terrace which has a moveable roof and heating -in case of need-.
You can enjoy playing pétanque, pool, air hockey, some arcades, and even board games.
The hotel seems well prepared for wheelchair access. Every stair has a ramp or an elevator next to it.
The room for hacklabs and talks is huge, with three divisions and independent accesses from the garden.
Before high season, when it gets full of tourists, you will be able to still enjoy a quiet village, the great beaches, and some sightseeing.
So I was reading G+, and saw there a post by Bernd Zeimetz about some "marble machine". Which turns out to be a very cool device that is programmed to play a single tune, and it is just mesmerising to watch:
So, naturally, I click through to see if there is more music made with this machine. It turns out the machine has been on the making for a while, and the first complete song (the one embedded above) was released only a few days ago. It is obviously porn for nerds, and Wired had already posted an article about it.
So instead I found a band called like the machine: Wintergatan, which sounds pretty great. It took me a while to realise the guy who built the machine is one of the members of the band. They even have a page collecting all the videos about the machine.
After a while, and noticing the suggestions from Youtube, I realise that two of the members of Wintergatan were previously in Detektivbyrån, which is another band I love, and about which I wrote a post on this very blog, 7.5 years ago!1. So the sad news is that Detektivbyran disbanded, the good news is that this guy keeps making great music, now with insane machines.
I only discovered Detektivbyran in the first place thanks to an article the -now sadly defunct- Coilhouse Magazine.
I find this 8-year long loop that closes unexpectedly during a late-night idle browsing session pretty amusing.
I keep telling my friends that I was a hipster before it was cool to do so... ↩
Dear node.js/node-webkit people, what's the matter with you?
I wanted to try out some stuff that requires node-webkit. So I try to use
npm to download, build and install it, like CPAN would do.
But then I see that the
nodewebkit package is just a stub that downloads a 37MB file (using HTTP without TLS) containing pre-compiled binaries. Are you guys out of your minds?
This is enough for me to never again get close to node.js and friends. I had already heard some awful stories, but this is just insane.
Update: This was a rage post, and not really saying anything substantial, but I want to quote here the first comment reply from Christian as I think it is much more interesting:
I see a recurring pattern here: developers create something that they think is going to simplify distributing things by quite a bit, because what other people have been doing is way too complicated[tm].
In the initial iteration it usually turns out to have three significant problems:
- a nightmare from a security perspective
- there's no concern for stability, because the people working on it are devs and not people who have certain reliability requirements
- bundling other software is often considered a good idea, because it makes life so much easier[tm]
Given a couple of years they start to catch up where the rest of the world (e.g. GNU/Linux distributions) is - at least to some extent. And then their solution is actually of a similar complexity compared to what is in use by other people, because they slowly realize that what they're trying to do is actually not that simple and that other people who have done things before were not stupid and the complexity is actually necessary... Just give node.js 4 or 5 more years or so, then you won't be so disappointed.
I've seen this pattern over and over again:
the Ruby ecosystem was a catastrophe for quite some time (just ask e.g. the packages of Ruby software in distributions 4 or 5 years ago), and only now would I actually consider using things written in Ruby for anything productive
docker and vagrant were initially not designed with security in mind when it came to downloading images - only in the last one or two years have there actually been solutions available that even do the most basic cryptographic verification
the entire node.js ecosystem mess you're currently describing
Then the next new fad comes along in the development world and everything starts over again. The only thing that's unclear is what the next hype following this pattern is going to be.
And I also want to quote myself with what I think are some things you could do to improve this situation:
- You want to make sure your build is reproducible (in the most basic sense of being re-buildable from scratch), that you are not building upon scraps of code that nobody knows where they came from, or which version they are. If possible, at the package level don't vendor dependencies, depend on the user having the other dependencies pre-installed. Building should be a mainly automatic task. Automation tools then can take care of that (cpan, pip, npm).
- By doing this you are playing well with distributions, your software becomes available to people that can not live on the bleeding edge, and need to trust the traceability of the source code, stability, patching of security issues, etc.
- If you must download a binary blob, for example what Debian non-free does for Adobe Flashplayer, then for the love of all that is sacred, use TLS and verify checksums!
This is the fifth -and last- part in a series of articles about SRE, based on the talk I gave in the Romanian Association for Better Software.
Other lessons learned
Apart from these guidelines for SRE I have described in previous posts, there were a few other practices I learned at google that I believe could benefit most operations teams out there (and developers too!).
It's worth repeating, it is not sustainable to do things manually. When your manual work grows linearly with your usage, either your service is not growing enough, or you will need to hire more people than there are available in the market.
Involve Ops in systems design
Nobody knows better than Ops the challenges of a real production environment, getting early feedback from your operations team can save you a lot of trouble down the road. They might give you insights about scalability, capacity planning, redundancy, and the very real limitations of hardware.
Give Ops time and space to innovate
If there is so much emphasis on limiting the amount of operational load put on SRE, that freed time is not only used for creating monitoring dashboards or writing documentation.
As I said in the beginning, SRE is an engineering team. And when engineers are not running around putting fires off all day long, they get creative. Many interesting innovations come from SREs even when there is no requirement for them.
Write design documents
Design documents are not only useful for writing software. Fleshing out the details of any system on paper is a very useful way to find problems early in the process, communicate with your team exactly what are you building, and convince them why it is a great idea.
Review your code, version your changes
This is something that is ubiquitous at Google, and not exclusive to SRE: everything is committed into source control: big systems, small scripts, configuration files. It might not seem worth it at the beginning, but having complete history of all your production environment is invaluable.
And with source control, another rule comes into play: no change is committed without first being reviewed by a peer. It can be frustrating having to find a reviewer and wait for the approval for every small change, and some times reviews will be suboptimal, but once you get used to it, the benefits greatly out-weights any annoyances.
Before touching, measure
Monitoring is not only for sending alerts, it is an essential tool of SRE. It allows you to gauge your availability, evaluate trends, forecast growth, perform forensic analysis.
Also very importantly, it should give you the data needed to take decisions based on reality and not on guesses. Before optimising, find really where the bottleneck is; do not decide which database instance to use until you saw the utilisation of the past few months, etc.
And speaking of that... We really need to have a chat about your current monitoring system. But that's for another day.
I hope I did not bore you too much with these walls of text! Some people told me directly they were enjoying these posts, so soon I will be writing more about related topics. In particular I would like to write about modern monitoring and Prometheus.
I would love to hear your comments, questions, and criticisms.
This is the fourth part in a series of articles about SRE, based on the talk I gave in the Romanian Association for Better Software.
As a side note, I am writing this from Heathrow Airport, about to board my plane to EZE. Buenos Aires, here I come!
So, what does SRE actually do?
Enough about how to keep the holy wars under control and how to get work away from Ops. Let's talk about some of the things that SRE does.
Obviously, SRE runs your service, performing all the traditional SysAdmin duties needed for your cat pictures to reach their consumers: provisioning, configuration, resource allocation, etc.
They use specialised tools to monitor the service, and get alerted as soon as a problem is detected. They are also the ones waking up to fix your bugs in the middle of the night.
But that is not the whole story: reliability is not only impacted by new launches. Suddenly usage grows, hardware fails, networks lose packets, solar flares flip your bits... When working at scale, things are going to break more often than not.
You want these breakages to affect your availability as little as possible, there are three strategies you can apply to this end: minimise the impact of each failure, recover quickly, and avoid repetition.
And of course, the best strategy of them all: preventing outages from happening at all.
Minimizing the impact of failures
A single failure that takes the whole service down will affect severely your availability, so the first strategy as an SRE is to make sure your service is fragmented and distributed across what is called "failure domains". If the data center catches fire, you are going to have a bad day, but it is a lot better if only a fraction of your users depend on that one data center, while the others keep happily browsing cat pictures.
SREs spend a lot of their time planning and deploying systems that span the globe to maximise redundancy while keeping latencies at reasonable levels.
Many times, retrying after a failure is actually the best option. So another strategy is to automatically restart in milliseconds any piece of your system that fails. This way, less users are affected, while a human has time to investigate and fix the real problem.
If a human needs to intervene, it is crucial that they get notified as fast as possible, and that they have quick access to all the information that is needed to solve the problem: detailed documentation of the production environment, meaningful and extensive monitoring, play-books1, etc. After all, at 2 AM you probably don't remember in which country the database shard lives, or what were the series of commands to redirect all the traffic to a different location.
Implementing monitoring and automated alerts, writing documentation and play-books, and practising for disaster are other areas where SREs devote much effort.
Outages happen, pages ring, problems get fixed, but it should always be a chance for learning and improving. A page should require a human to think about a problem and find a solution. If the same problem keeps appearing, the human is not needed any more: the problem needs to be fixed at the root.
Another key aspect of dealing with incidents is to write post-mortems2. Every incident should have one, and these should be tools for learning, not finger-pointing. Post-mortems can be an excellent tool, if people are honest about their mistakes, they are not used to blame other people, and issues are followed up by bug reports and discussions.
Of course nobody can prevent hard drives from failing, but there are certain classes of outages that can be forecasted with careful analysis.
Usage spikes can bring a service down, but an SRE team will ensure that the systems are load-tested at higher-than-normal rates. They could also be prepared to quickly scale the service, provided the monitoring system will alert as soon as a certain threshold is reached.
Monitoring is actually the key part in this: measuring relevant metrics for all the components of a system, and following their trends over time, SREs can completely avoid many outages: latencies growing out of acceptable bounds, disks filling up, progressive degradation of components, are all examples of typical problems automatically monitored (and alerted on) in an SRE team.
This is getting pretty long, so the next post will be the last one of these series, with some extra tips from SRE experience that I am sure can be applied in many places.
A play-book is a document containing critical information needed when dealing with a specific alert: possible causes, hints for troubleshooting, links to more documentation. Some monitoring systems will automatically add a play-book URL to every alert sent, and you should be doing it too. ↩
Similarly to their real-life counterparts, a post-mortem is the process of examining a dead body (the outage), gutting it out and trying to understand what caused its demise. ↩
This is the third part in a series of articles about SRE, based on the talk I gave in the Romanian Association for Better Software.
In this post, I talk about some strategies to avoid drowning SRE people in operational work.
SREs are not firefighters
As I said before, it is very important that there is trust and respect between Dev and Ops. If Ops is seen as just an army of firefighters that will gladly put away fires at any time of the night, there are less incentives to make good software. Conversely, if Dev is shielded from the realities of the production environment, Ops will tend to think of them as delusional and untrustworthy.
Common staffing pool for SRE and SWE
The first tool to combat this at Google -and possibly a difficult one to implement in small organisations- is to have single headcount budgets for Dev and Ops. That means that the more SREs you need to support your service, the less developers you have to write it.
Combined with this, Google offers the possibility to SREs to move freely between teams, or even to transfer out to SWE. Because of this, a service that is painful to support will see the most senior SREs leaving and will only be able to afford less experienced hires.
All this is a strong incentive to write good quality software, to work closely and to listen to the Ops people.
Share 5% of operational work with the SWE team
On the other hand, it is necessary that developers see first hand how the service works in production, understand the problems, and share the pain of things failing.
To this end, SWEs are expected to take on a small fraction of the operational work from SRE: handling tickets, being on-call, performing deployments, or managing capacity. This results in better communication among the teams and a common understanding of priorities.
Cap operational load at 50%
One very uncommon rule from SRE at Google, is that SREs are not supposed to spend more than half of their time on "operational work".
SREs are supposed to be spending their time on automation, monitoring, forecasting growth... Not on repeatedly fixing manually issues that stem from bad systems.
Excess operational work overflows to SWE
If an SRE team is found to be spending too much time on operational work, that extra load is automatically reassigned to the development team, so SRE can keep doing their non-operational duties.
On extreme cases, a service might be deemed too unstable to maintain, and SRE support is completely removed: it means the development team now has to carry pagers and do all the operational work themselves. It is a nuclear option, but the threat of it happening is a strong incentive to keep things sane.
The next post will be less about how to avoid unwanted work and more about the things that SRE actually do, and how these make things better.
This blog is powered by ikiwiki.