It’s All About the Data

This is going to sound stupid, but…One thing I didn’t really think about enough prior to starting my current job is just how much working with data is involved. Yes, I know. Information systems. It’s kind of all about data. But when I imagined what this job would entail, I thought a lot about building things, about creating systems that support and spark teaching and learning. The reams and reams of data that go along for the – names, email addresses, course CRNs, logs, files, etc – just didn’t seem that sexy to me.

But after a few months of being a sysadmin, I’m thinking about data more and more. There are the obvious things: am I doing enough to keep data safe in the age of constant breaches? Are we thinking about giving users control of their data in any meaningful way in light of evolving views about digital privacy? These are big questions. But in the day to day, I’m just working with like, rows and rows and rows.

It turns out, people have questions they want answered about the digital systems that play such an important role in our schools. These are all questions I’ve answered in the last month:

Which instructors used the accessible Moodle theme we provided in their courses?

I accidentally deleted a quiz, is there any way to get the grades each student in my class got on it?

Can you tell me who at the University has not yet completed this mandatory training?

How many people logged into Moodle on the first day of classes?

Here is the CRN numbers of 50 courses – can you get me the course full name, id number, and the instructor of record?

So what have I learned?

Sometimes this data is readily available within an application itself. For example in the recent versions of Moodle each course has a very powerful completion tracking and activity tracking report built-in. It is also be possible to extend this functionality with plugins.

Reports or logs can also generally be exported into a .csv or spreadsheet format. I was already pretty proficient with Excel but I’ve upped my game in the last few months just due to the amount of time I’ve spent in spreadsheets. (I don’t think I’ve opened PowerPoint in that time, I’ve kind of flip-flopped in that regard. Less presenting polished data and more churning through raw data).

In addition to a sheets tool, a good plain-text editor is your friend when data needs to be manipulated to be useful. I’ve been using VS Code and find it really useful for manipulating text, especially the ability to find and replace with a regex expression. This is incredibly handy when you have a comma separated list but it just has to have each item on a new-line to import wherever it is you need it to go.

If built-in reports or exported info isn’t cutting it, I’ve been going straight to the source itself: typically the Moodle database accessed by PHPMyAdmin. This lets me run SQL queries in a pretty friendly GUI environment and export the results. I’ve found that simple SQL itself is relatively easy to write – it’s understanding the complex structure of a huge relational database, how the tables need to be joined, and thinking through the links between tables that takes a bit of time. But, I am getting faster at this process – I find sketching the query out and making kind of a map of the relevant tables and fields helpful – and this is what has allowed me to do things like recover the grades of a deleted quiz or return a list of instructors who have used a certain theme.

Sketching out a SQL query
Sketching out a SQL query

Photo by Markus Spiske on Unsplash

Patch Notes: The Case of the Mysterious Crashing Server

I’ve been kicking a few ideas around, thinking about what topics or experiences I wanted to write about to chronicle my journey into systems administration/architecture. This week I had the fortunate misfortune to to come across the perfect situation for the first entry in this blog series when I received an email from one of the math professors at UP about our WeBWork server.

A bit of background on WebWork: it’s an open source program for delivering math homework via the web. We’ve run it at UP for a number of years, and recently migrated to AWS , using a single instance running Ubuntu.

Back to our emailing professor. She had been using the system over the holiday break and noticed intermittent lock-ups. She could use the site normally for about half an hour before things went sideways and it became unresponsive; if she waited a few hours and came back it would be working again, but only for about 30 minutes at a time.

After getting the email I checked the server and yeah, it was totally unresponsive via the web and SSH. Time for a hard reboot via the AWS console. After rebooting everything seemed fine but before long the prof let me know she was still experiencing the same issues. This time I was able to SSH in before things went totally haywire – but trying to execute any commands returned an error:

-bash: fork: Cannot allocate memory

Aha! A clue. I rebooted again and downloaded htop:

sudo apt-get install htop

htop is a CLI program for monitoring system usage – including memory. I hadn’t used it before, so I took advantage of the Lynda.com account I have through work and watched a good htop tutorial video to get a feel for it. When WebWork was in use, I could see via htop that there were some processes owned by the Apache server (www-data) that were eating up all of the available memory, of which we only had a paltry 4GB (I suspect that the original sysadmin who built the server intended to use an auto-scaling group that would spin-up additional resources on-demand, but was never able to get this working correctly).

A bit of research turned up lots of forum posts that discussed WebWork’s nasty habit of eating up system memory and failing to give it back, which over time can use up all the available RAM and result in crashes – some useful threads:

Through these posts, I learned about the Apache Max RequestWorkers and MaxConnectionsPerChild parameters. These control how many processes the server can spawn before shutting the oldest/most bloated ones down and can be tweaked to keep WebWork from running too many memory-obliterating tasks at once. It’s a balancing act, though: allow too few simultaneous requests, and unnecessary lag is introduced as Apache is forced to create new processes constantly while memory sits unutilized. A quick visit to the appropriate Apache config file at /etc/apache2/mods-available/mpm_prefork.conf confirmed that the server was still at the default setting of 150 Request Workers and unlimited child connections per requests (0 = unlimited in this case).


StartServers             5 
MinSpareServers          5 
MaxSpareServers          10 
MaxRequestWorkers        150 
MaxConnectionsPerChild   0

A bit more research turned up the install guide for WebWork on Ubuntu and “a rough rule of thumb” of 5 MaxRequestWorkers per 1 GB of memory and a MaxConnectionsPerChild value of 50.

This gave me the formula to determine optimal Apache settings but I wanted to increase the system RAM as 4GB still seemed likely to be insufficient for any heavy use. This was easy to do in AWS as the WebWork instance was a simple single Elastic Block Store (EBS) backed EC2 Amazon Machine Image (AMI).

In the EC2 AWS console:

  • Take a snapshot of the root volume attached to the instance (just in case)
  • Instance state -> Stop
  • Actions -> Change instance type (in my case I changed from t2.medium with 4GB RAM to a t2.large with 8GB RAM)
  • Instance state -> restart

I logged back in and tuned the Apache server for 8GB of RAM with the following settings in the mpm_prefork.conf file (8GB X 5 = 40):

StartServers             5 
MinSpareServers          3 
MaxSpareServers          5 
MaxRequestWorkers        40 
MaxConnectionsPerChild   50

This was followed with a quick restart of Apache to apply the changes:

sudo apachectl restart

I headed to the site and logged onto a test course, trying out some searches in the Library Browser and found that things were improved: I could still see in htop that processes were eating up memory, but they would quickly be killed off and the memory returned to the system. I may have to tweak things when students start logging in and hitting the server with a lot of simultaneous, small requests, but for now so far so good.

Photo by Paxson Woelber on Unsplash

Introducing Patch Notes

2018 was a wild year for me. I had a kid, moved into a new, more technical role at work, and while I haven’t quite yet finished grad school, I’m now close enough that I feel the senioritis kicking in! All that is to say I’m looking forward to what I can achieve in 2019 as I finish school and continue my journey into dad-hood, but it’s the still-new job – Technology Solutions Architect at the University of Portland – that’s the subject of this post.

Moving from the world of edtech, which involved supporting, training, and consulting with faculty on technology tools, to now building, deploying, and administering those tools has been a major transition. As I think about what I know, and what I don’t know, I’ve been reflecting on what helped me to be successful in previous roles and what I can bring to my new one. Something that has stuck out is that I’ve made a habit of teaching, tutoring, writing and making video content that engages with my field. This serves multiple purposes:

  1. I firmly believe that teaching something, whether that’s by presenting on a topic or creating a how-to guide is one of the best ways to build and retain a deep understanding.
  2. It helps me to show what I know and documents my growth in my new field.
  3. Hopefully my content can help others who are in or aspire to learn more about the type of work I’m doing. I’m still in Higher Ed, after all!

So, enter this blog. I’m calling the WordPress category for these posts  “Patch Notes” – it’s my way to reflect on and document my professional growth. Possible topics include:

Moodle, WordPress, Linux, databases, cloud computing, open source, education technology, higher ed, Office 365, and more.

Photo by Franck V. on Unsplash