Teaching Exercises

From Dan Shearer CV

These are some exercises and tricks I have either created or been subjected to over the years, and I have mentored students through them on many occasions.

Security

  • Point Whonix at a server we control and try to de-anonymise a web page access using network capture and analysis. Compare with doing the same from a consumer operating system instead of Whonix.
  • Construct a single-purpose computer in an embedded application, such as firefox/chromium in kiosk mode on a laptop running Ubuntu Linux. Then destabilise the computer using all attacks such as network, physical, software, sidechannel and social engineering.
  • The goal of security researchers (regardless of hat colour) is often to get control of userspace. For many years there have been Linux Play Machines online with root password published and full shell access to userspace to anonymous users from anywhere. Yet these machines are considered secure. What does this say about security generally? Build and attack such machines.

Complexity and Tech Robustness

  • Follow instructions to install a common 2024 web stack from its component parts on a fresh virtual machine: Vue.js+Node.js+Apache+language server+SQL database. Say "hello world", and do basic reliability testing. Introduce small but plausible changes in the stack components to check they make an observable difference.
  • Travel back to 1975 by booting IBM MVS 3.8 in Hercules. If instructions are followed exactly it doesn't take long to get a working system (Strong Hint! Follow the instructions, because your computing experience is probably irrelevant.) This takes about the same amount of developer time as Vue.js in the previous challenge to get to "hello world". Introduce small but plausible changes. Which stack is most likely to be working in ten years?
  • Follow instructions to connect Vue.js "Hello" to use MVS as a database. This is a ridiculous stack. Compare the stack levels and their fragility to a typical distributed microservice architecture with 7 levels of language involved. Which is most likely to be working in one year? Never mind the lines of code, just think about the number of translation layers. Which stack is the most ridiculous?
  • Consider the modern computer and operating system of your choice printing "hello" from local storage. Using public information, estimate the number of lines of code in every element in the stack down to the CPU transistor level. Now apply common bug metrics to this result, and human factors engineering. How many people with which skills would be needed to fix any problem? Does it matter?
  • Consider the transistor-up stack; which components at each level publish source code? Compare AMD to RISC-V at the bottom; does this cover all the code running at what we can roughly call the "silicon level"? 30 billion transistors on a modern 2020-era chip equates to many millions of lines of RTL, which is generated from many fewer millions of lines of VHDL/Verilog, which in turn is often generated from even fewer lines of a high level design language. Can we deduce anything about the complexity relationships in 3D chip designs with trillions of transistors? Given that AI-assisted design tools are essential for billion-scale silicon, what can we expect for trillion-scale silicon in 2024 and later? Is it relevant that there is a single worldwide source of supply in The Netherlands for machines to make 3D chips, which is how trillion-scale silicon is likely to be implemented in its early days at least?
  • This Reddit "Ask Me Anything" with the SpaceX developers gives some details of the rocket flight software. Draw an architecture diagram of the relevant stacks. What can we decide about complexity and reliability? Are these good choices? Can we conclude this is reliable software? Are there failure modes other than "You will not be going to space today?"

Operating System Technology

  • Linux from Scratch takes a few hours to get a prompt running from the bare components (a full system takes much longer.) Use checksumming to compare the binaries created by different students' Linux from Scratch. Why aren't they all the same? Debian partly solved this in October 2021 after 20 years while even NetBSD, a source distribution unlike Debian, still struggles. Does this kind of reproducibility matter?
  • The Linux kernel source is a little under 30 million lines of code. Compile the smallest useful kernel you can, and estimate the number of lines of code used. Is Linux bloated? Compile a kernel on Ubuntu and estimate the number of lines of code. How much of this is running at boot time? Is Ubuntu bloated?
  • Modify an operating system so that so that any time the user types "hocus pocus" in any context a log message is sent to a log server over the internet. Are there any limitations on your implementation?
  • Modify an operating system to respond to a single network packet of a specified type. What would be good starting points for this?
  • A typical fresh Linux server install between 1x10^5 and 2x10^5 files depending on distribution (my laptop, however, always has of the order of 10^6 files.) How many binary executable files are there in the smallest useful Linux deployment?

Software Development

  • I claim that among the hardest software engineering tasks is reliable progress bar estimates. Prove me wrong by implementing a progress bar that meets user expectations and handles the changing environment within a computer and from the outside world. Hint: what are the user's expectations? What are progress bars imagined to be communicating?
  • Write a graphical internet web application using Node.js to display all the information it can deduce about its network connection (where it is geographically and when, what standards are supported, etc). Explain how you can be sure this application will still run reliably in ten years time and what the limits on this are. Repeat using C or Rust.