For the last few weeks I have been thinking about an exchange with the ever dynamic Francois Grey about citizen science, and what it would take to get to an actual significant discovery. This is in the context of my involvement with the long running Cosmic Pi project, an attempt to produce open source cosmic ray detectors based on cutting edge technology, so I will also do my best to share the lessons learned from this endeavour. While we haven’t formally terminated the project, unfortunately none of the current team members has time needed to continue the project so it is currently on “pause”. Added to this, there are a lot of supply issues at the moment – so let’s just say it’s in stasis for the time being, hopefully to re-emerge at some point in the not too distant future.
Choosing your battle: Physics vs Biology vs Other things
How easy is it to discover a new force or fundamental particle, in comparison to a new type of fungi? I watched the excellent documentary “Fantastic Fungi” featuring Paul Stamets recently on Netflix. It hadn’t previously occurred to me that you could potentially discover a new type of mushroom (or other biological entity) in your back garden or local wilderness – but it seems to be quite plausible with a reasonable amount of effort. And for a few hundred dollars, you can probably even get the DNA of your new find sequenced!
However, if you wanted to discover the Higgs Boson on your own (or even with a few like minded individuals), you would need very deep pockets and a ridiculous amount of time. Forbes estimated the cost of the discovery at $13.25 billion, plus the time of over 5000 researchers, not including the efforts of all those working on the infrastructure to support the discovery (like me since 2010).
These are probably the two extremes of the science spectrum, in terms of the validity of findings and general usefulness to the wider human species. There are also doubtless many other fields of endeavour and inquiry that fall between the two extremes, with a range of cost (money, time and resources) and reward (discovery, or significant advancement in human knowledge) trade-offs.
What does it take for Particle Physics?
The standard for a discovery in Particle Physics is 5 sigma. For those of you familiar with p-values, it’s the same principle – a statistical test to determine the likelihood that the observed result could be a fluke, rather than a real discovery. 5 sigma means 5 “standard deviations”, on a traditional bell curve. It is interesting to note that lower levels of significance can still be worth publishing, with significance of 3 sigma and beyond considered “evidence” of something new, but insufficient for a discovery. The probability of a false result at 5 sigma significance is 0.00006%, but of course it isn’t just a statistical test that determines a discovery, everything else also has to line up.
More practically, such a high level of confidence can only be reached with a large number of trials or observations. I’ve spent about a week thinking about ways to explain this concisely with some statistics examples, but to do the subject justice it really requires a full blog post on it’s own. Until I get round to writing it, I’d suggest you check out this article in Scientific American.
I started working on the CosmicPi project a few years ago now (in 2014!) with some other young, enthusiastic engineers and physicists I knew at working CERN. We all did something with particle detectors and their supporting infrastructure as part of our day jobs, but each of us had only a very small slice of the overall knowledge required. We decided to build a muon detector, using the latest technology we could find. And we knew it would be difficult…
It took several years and a lot of help before we detected our first “muons”. And then a couple more years when we figured out that these weren’t actually muons, but electronic noise and to redesign things to capture the real thing. I’ve lost count of the number of prototypes we built, it’s at least 10 different versions. In short, if you want to build distributed hardware for particle physics, you will need to build a device that can take in particles of whatever type you are interested in (I would strongly recommend muons), and emit some form of TCP/IP packet that codifies the detection and goes to a central server where someone can look at it in combination with all the other packets your devices are detecting.
Consumer electronics
The more astute readers will have already guessed that a device which detects a particle, and gives out a digital signal as an output, could also be described as a “product”. It is a manufactured object of non-trivial complexity, with a moderate cost associated. We were aiming to build a device 10x cheaper than the competition, and we managed this – in terms of costs, (but not sale price, because a) we haven’t started selling it yet, and b) some margin is required when selling anything).
The trap (perhaps it isn’t a trap) is that to scale your detectors you will either need someone with a lot of money (a lot), or to do some form of crowdfunding – where you sell your products to customers, who will host them. We’re not just talking about a box with a flashing light on it, but actually a very complicated, sensitive piece of equipment – an overall level of difficulty that puts most kickstarter campaigns to shame.
You also need to take the components and put them in a box. This is a very non-trivial activity, and since everything needs to fit in the box, and housings are either off the shelf, or custom molded ($$$ unless you have a huge volume of manufacture into the tens of thousands) it’s a good idea to choose your case appropriately. If you want to go the full distance, you will also need a manufacturer to put the components together in the boxes (and test them!). But even after nearly a decade we still haven’t got this far yet.
Lots of moving parts
Building a cutting edge particle detector is not easy. You will need a detector medium, we chose plastic scintillators, as they can be extremely cheap – but are rather hard to get hold of commercially unless you are buying by the tonne. You will also need some electronics to read out the detector, which will include some specialist analog circuits, as this is what it takes to detect very small, fast moving particles that are only going to generate a few photons of light in your chosen medium. These electronics have to be designed and prototyped iteratively. Before we had even finished our first generation of prototype, the main microcontroller we were using was already obsolete! So a redesign was required before we could move to the next stage of manufacture.
There are plenty of other options, such as getting recycled or spare detector chips from full size physics experiments, or repurposing existing semiconductors which are sensitive to various forms of radiation. The former may run into availability issues and export constraints, while the later path can massively reduce the amount of data collected by a particular detector. Ultimately data is what can lead to discoveries, so the more you capture the better.
Building a working detector is just the smallest Matryoshka doll. Around this you also need to build an infrastructure, both in the conventional sense (team, website, means to get your detector to the customer and their money into your bank account) and in the Physics sense. To use the oil analogy, raw data is just a resource, the value only comes when it is exploited with a means to process it. There are plenty of physics data analysis frameworks which exist, with varying degrees of complexity, but they all require significant pre-processing of the raw data and the addition of constructs that constitute the physics process you are searching for. A very reductive way of viewing a modern Physics PhD is that it involves three years writing and running data analysis scripts in order to generate some new insight from the vast array of high energy physics data collected by modern large scale detectors.
Full Stack software.
I find job adverts for ‘full stack’ developers rather funny. Because typically they only really want a couple of layers of the stack at most and certainly nothing that touches real hardware. The development stack for a particle detector goes all the way to the bottom. If you are building a new detector, you will need to read in an analog signal via some electronics, and somehow get it all the way up the software stack so it prints out on a screen or webpage. Practically, this means there is a need for both firmware (embedded software that runs on a microcontroller) and software, which can interface the microcontroller with a computer and share your data with the world. To build a ‘product’ appliance, that can be operated without the benefit of a full PhD, you will also need to handle everything from the calibration of the device (ideally automatic!) to setting up a device hosted Wi-Fi network and building a web interface, so that users can actually connect to your magic discovery box with their PC or Phone.
Who has done this before?
We wasted an inordinate amount of time discussing the totally irrelevant. Could we manufacture our own scintillators with embedded wavelength shifting optical fibres? Should our device have a battery inside? Would a solar panel on the top be enough to power it? This was due to inexperience, but also a learning and sharing of knowledge, which (inconveniently) is not a linear process.
What we needed was someone who had done this before to act as a mentor and guide. Someone with experience in electronics design, prototyping, manufacture. We connected with a lot of people but there are very few at the intersection of science and consumer electronics with all the relevant experience – and fewer still with sufficient free time for a project like this. There are plenty of science experts, but very few emerging experts in DIY electronics at scale, who are mostly self-educated via the mistakes of various crowd-funding campaigns they have just about survived. It’s still a rarefied skillset, even if you happen to be located at CERN.
A personal inspiration to me has been Bunnie Huang, and I can’t recommend his book The Hardware Hacker highly enough. We have been using it, recommending it to other teams we cross attempting a similar challenge, and generally trying to learn from his mistakes when we haven’t already made them ourselves. In retrospect, we could really have used a mentor to guide us on this journey. We are still looking, and in the meantime the next best thing is to share our experience with others. While we have been on our journey, open science hardware communities have started to emerge, the most notable being GOSH – the Gathering for Open Science Hardware. This is also the Journal of Open Hardware, which has also started while we’ve been working on Cosmic Pi, and maybe one day we’ll even get round to publishing an article in it about our detector!
The Profit Motive
What motivated our team? It was a lot of things, the fun of working together with like minded people, learning new skills and trying new things, the potential for discovering something, and democratising access to the technology through the code and schematics we published online. The profit motive doesn’t really feature, and as a group we are are missing a marketing department. Unfortunately (?), we are the type of people who would price our product based on what it cost to build, plus a markup we thought was reasonable. Typically in commercial electronics, if you aren’t making a 5-10x mark-up, you don’t stay in business for very long. In addition to sage advice from Bunnie, the EEVblog from Dave Jones is a resource beyond compare for those on this journey.
Design For Manufacture (DFM)
Our design has many weak points, which of course have been exposed by the ‘Great components shortage’ of 2021/2/3/n. If you open up two seemingly identical consumer electronics products manufactured a few months apart, there’s a fair chance you will find they have some different components and integrated circuits inside. This is because large volume manufacturers (and smart smaller volume ones), tend to design with at least one alternate part for each critical component. This allows them to continue production when something is out of stock. The alternative is to redesign the board on the fly, based on available parts – and of course you will probably want to test it again before making a lot! Or you can pay a ridiculous amount of money to someone who has some stock of the chips you need.
And then there is the more mundane, “traditional” side of DFM – making sure that your circuit board complies with design rules and good practices for the production process, ensuring you have sufficient margins on your passive components and design calculations to ensure that you get a reasonable yield.
This is a hugely time consuming activity. I have some friends who spend their day jobs right now redesigning existing products to work around the chip shortage. This type of operation is far beyond the resources we have as a bunch of individuals trying to build a cosmic ray detector. While it doesn’t take 100% of the effort all over again to produce a pair of design variants, even if another 20% is needed this is a lot for a volunteer project.
Putting it all together
I’ve filled out a typical business model canvas for the Cosmic Pi project. You can download it for your own remixing via the link below. We haven’t even started down the commercial part of this adventure, so I’ll just leave this here for now.
Some Lessons Learned
I have learned many things on this journey about how to build a particle detector and the top to bottom architecture of a globally-scalable IoT class device. Most of my biggest learning points come from the mistakes but not all, here are my top 5 lessons.
- Footprints for PCB components. The first fevered weekend of building a detector was spent painstakingly soldering tiny wires to inverted components that looked like dead spiders, all because I hadn’t verified the pad dimensions well enough on our very first prototype. Always double check your device footprints (and pin outs). Always.
- Humans. This project has been kind of a mash-up of science and open source, with a side helping of start-up. The most important part of the puzzle is the human element. As usual I roped in a few friends, made some new friends along the way, and we had some fallings out too! Trying to wrangle a team of very skilled, highly intelligent volunteers with divergent ideas into a project can be challenging. When conflict erupts, which it will, make sure that your friends know that any disagreements aren’t personal, and that you value your friendship independently from the project. If you see tempers rising around a particular issue, don’t wait for things to boil over before getting involved. And if you are wrong, or over the line on something, apologise as soon as you realise it. Things have been a lot of fun, but it hasn’t always been easy. I don’t think I destroyed any friendships (so far)?
- Ignorance. I know thing X. Thing X is obvious (to me)… but it turns out that some team members didn’t know thing X, and didn’t even know that they didn’t know it. They took on a challenge, and got into difficulties that affected the whole project because of their ignorance. We’ve all done it in different ways, with impacts that vary from expensive to time consuming. Of course, it is necessary to assume some level of common knowledge (and trust) when any two people are working together, but I find it is always worth taking the time to frame the task and go over the first principles at the start of any new collaboration.
- Interns are amazing. We have been fortunate enough to have a few interns working on the project, some of whom were even full time and funded. The progress they have been able to make on the project, working full time, as opposed to free evenings and weekends for the rest of the team, has been inspirational. We were able to have a good win-win relationship with all the students who worked on the project so far. The ones who were funded got paid and all of them learned valuable skills in programming, electronics and particle detector assembly, plus the lesson of how hard it all is to put together.
- Entropy is a problem, especially in software. Just because you have a set of awesome software for your device that’s tested on hardware platform Y.0.00.0, doesn’t mean it will work at all on hardware platform Y.0.00.1. Or even on your original platform with a version update to your OS or it’s major libraries. Software requires maintenance! The rules, settings, configuration requirements, dependent libraries are all shifting. To minimise your exposure to entropy I recommend:
- Keep it simple. The less code you write, the less there is to maintain (and you should have fewer bugs too). The software problem hasn’t changed fundamentally since the 1970’s, you should read The Mythical Man Month by F. Brooks Jr for wisdom and inspiration. It’s the best book I didn’t read at university.
- Put as much of the data processing into your embedded elements, i.e. firmware, as you can (within reason), as this will be stable across software changes. Keeping our data output format stable for versions 1.6 through 1.8 saved us a lot of time.
- Scripts not software. It’s much easier to maintain a hundred lines of Python than something compiled. If you can rely on software platforms (InfluxDB, Grafana) for the heavy lifting that is ideal, and if not then consider ‘module’ level systems such as SQLite and Python libraries. Writing your own linux kernel drivers in C is always possible, but will require a lot of upkeep.
- When it comes to embedded binaries, make sure you keep a copy of the latest version you have compiled for distribution (and all previous versions too..). This is especially important if you are using development environments such as Arduino, where the actual .bin/.hex file can be abstracted away under the plug and play upload button.
- Git. Things which are put in a repository are great. Things which aren’t are usually lost over the years it takes a project to get to maturity.
A conclusion, for now at least.
I hope to have shared insights into at least some of the ground we covered with Cosmic Pi so far. We’ve come a long way, but just like climbing a mountain we might have scaled the first and the second rise, it’s still a long way to the summit. If you are full of enthusiasm and want to get involved please drop me a line, or if you would like to chat about your own open hardware science projects feel free to get in touch with me via twitter, where you can find me as @pingu_98.