Thursday, April 7th
14h00 - 15h00 / Room: Zwarte Doos
Shaz Qadeer (Microsoft Research)
Programming devices and services in P
Abstract: P is a programming framework for design, implementation, and validation of event-driven asynchronous systems. The P language incorporates deep modeling and specification techniques into asynchronous programming. It allows the programmer to systematically test and debug their applications before deployment, thus preventing Heisenbugs that are extremely difficult to find and fix later.
P has had significant impact on Microsoft products. The USB drivers shipped by Microsoft (Windows 8.1 onwards) have been written in P; these drivers run on hundreds of millions of devices. The design of P has also been implemented independently by engineers in Microsoft Office and Azure for components being written by their team. In Microsoft Azure, there are ongoing projects that are using P to implement services. Finally, researchers at Microsoft and UC Berkeley are exploiting P to build a reliable software stack for autonomous robots.
My talk will provide an overview of the key ideas behind P and conclude with a discussion of a few open research problems.
Friday, April 8th
09h00 - 10h00 / Room: Zwarte Doos
Pierre Wolper (University of Liège)
Model Checking: What Have We Learned, What Will Machines Learn ?
Abstract: Model Checking was introduced more than 30 years ago and, thanks to a steady stream of improvements and new approaches, has developed into a widely used and quite effective tool for verifying some classes of programs. Surveying and reflecting on these developments, this talk attempts to highlight the main lessons learned from the last three decades of research on the topic. Then, looking towards the future, it speculates on what the next decades could bring and on whether it would not be time for machines to do the learning, in order to provide developers with the effective verification assistant they are still waiting for.
Friday, April 8th
14h00 - 15h00 / Room: Zwarte Doos
Tim Willemse (Eindhoven University of Technology)
On Verification Challenges at the Large Hadron Collider
Abstract: The Large Hadron Collider (LHC) experiment at the European Organization for Nuclear Research (CERN) is built in a tunnel 27 kilometres in circumference and is designed to yield head-on collisions of two proton (ion) beams of 7 TeV each, i.e. 14 TeV in total. Next to three ‘small’ detectors for specialised research, the LHC hosts four large particle detectors: the general purpose detectors for the CMS and ATLAS experiments and the more specialised LHCb and ALICE experiments. The general purpose detectors study a wide range of particles and phenomena produced in the high-energy collisions in the LHC and yielded first evidence of the existence of a particle matching a Higgs boson in 2012. Typically, the experiments are made up of subdetectors, designed to stop, track or measure different particles emerging from the collisions. In 2015, it achieved an unprecedented successful 13 TeV collision, surpassing previous records of 8 TeV (2012) and 7 TeV (2010) collisions. The LHC experiments provide a gold mine of challenges for computer scientists, some of which I will address in this talk.
The control software. The Detector Control System takes care of control, configuration, readout and monitoring of the detector status, numerous hardware devices and various kinds of environment variables. The architecture of the control software for all four big LHC experiments is based on the SMI++ framework ; this framework views the real world as a collection of objects behaving as finite state machines (FSMs), which are organised hierarchically. A Domain-Specific Language called the State Manager Language (SML) is used to describe the FSMs (which are typically of low complexity) and the entire framework takes care of the interaction between these FSMs. To give an impression of the complexity involved: the control system for the CMS experiment at any time contains between 25,000 and 30,000 FSM nodes, and each FSM contains, on average, 5 logical states; this leads to a rather conservative estimate of a state space of somewhere between 1025,000 and 1030,000 states.
Despite the modular architecture, the control system occasionally exhibited undesirable behaviours, which had the potential to ruin expensive and difficult experiments. Formal verification techniques were used in an effort to improve its quality. SML and its underlying communication mechanisms were formalised in the process algebraic language mCRL2. This formalisation facilitated the understanding and enabled automated analyses of the behaviour of small constellations of cooperating FSMs. Not entirely unexpected, the prohibitive size of modest constellations (constellations involving more than 12 FSMs) prevented scaling the analysing to the global control system. On the one hand, the resulting frustration with this state of affairs inspired the development of new theory and tools for the mCRL2 toolset. On the other hand, the analyses revealed that there were several interesting consistency requirements that could be checked locally, and, indeed, log files of the control system showed that violations of such local consistency requirements had caused prior freezes of the control system. We subsequently developed dedicated verification tools and integrated these in the FSM development environment. Using these tools, we found that in the CMS experiment, up-to 20% of the non-trivial FSMs violated a local requirement.
Data acquisition. The Worldwide LHC Computing Grid launched in 2002, providing a resource to store, distribute and analyse the mountain of data produced at 6Gb per second (and soon expected to increase to 25Gb per second) by the LHC experiments. Among the software frameworks that employ the computing grid is the Distributed Infrastructure with Remote Agent Control (DIRAC) software framework. Programmed in Python, DIRAC provides cooperating distributed services and a plethora of lightweight agents that deliver the workload to the grid resources.
Despite the effort invested in making DIRAC reliable, entities occasionally get into inconsistent states. We reverse engineered several critical subsystems related to DIRAC and used mCRL2 for simulating, visualising and model checking to find race conditions and livelocks. These were subsequently confirmed to occur in the real system. These findings led to subsequent improvements in the implementation of DIRAC.
Acknowledgements. Joint work with, a.o., Frank Glege, Robert Gomez-Reino Garrido, Yi-Ling Hwong, Sander Leemans, Gijs Kant, Jeroen Keiren and Daniela Remenska. This research was supported in part by NWO grant number 612.000.937 (VOCHS).