Collaboration to catch hackers

Alan Fern, professor of computer science, (right) and his Ph.D. student, Amran Siddiqui (left) are advancing methods of detecting cyberattacks known as advanced persistent threats.


Cyberattacks are getting more frequent, bigger, and more destructive. New research at Oregon State University aims to stop hackers by combining the muscle power of artificial intelligence with the brains of cybersecurity experts. The project is led by Galois Inc. and includes collaborators at University of Edinburgh and Synaptiq.

Season number
Season 4
Episode number

[MUSIC: The Ether Bunny, by Eyes Closed Audio, used with permission under a Creative Commons Attribution license.]

NARRATOR: From the College of Engineering at Oregon State University, this is "Engineering Out Loud."

RACHEL ROBERTSON: And this is Rachel Robertson, today on "Engineering Out Loud" we will be talking about the pirates of the 21st century – hackers. Cyberattacks are getting more frequent, bigger, and more destructive. The first half of 2017 has been particularly bad -- 1.9 billion records of personal information were stolen, which is an increase of 164 percent over the last half of 2016.

We begin with that huge cyberattack that hit tens of thousands of computers all over the world…

Clip from Cyber Security Experts Say Malware Used In Friday's Attack Is Especially Malicious

Cybersecurity experts say that this piece of malware had and especially malicious quality

Clip from Cyber Security Experts Say Malware Used In Friday's Attack Is Especially Malicious

Equifax, the big credit reporting agency, reports that it was hacked in May. Whoever did the hacking seems to have gained access …

Clip from Credit Reporting Agency Equifax Reveals Massive Hack

A Russian intelligence agency launched a cyberattack last year against a company that helps run American voting systems…

Clip from Report: Russia Launched Cyberattack On Voting Vendor Ahead Of Election

ALAN FERN: Cybersecurity is a big deal.

ROBERTSON: That is Alan Fern, professor of computer science at Oregon State and an expert in artificial intelligence.

FERN: It's an enormously challenging problem and it is one of the places where surprisingly you know AI has not had a huge impact there yet, largely because it's such a hard problem.

ROBERTSON: It’s a hard problem that Fern and his graduate students are working to solve with a new approach that combines the muscle power of artificial intelligence with the brains of cybersecurity experts.

But let’s back up a bit and find out how this project got started. And as you based on the other podcasts in this season -- this project involves a collaboration.

RYAN WRIGHT: I'm Ryan Wright. We're at Galois Inc. here in Portland, Oregon.

ROBERTSON: Can you describe what Galois is?

WRIGHT: We're a research lab. We focus on computer science and especially on some of the hardest problems in computer science. Typically around formal verification and program correctness. So our mission is ensuring trust in critical systems.

ROBERTSON: This particular project is funded by a government agency called DARPA which stands for Defense Advanced Research Projects Agency.

WRIGHT: The U.S. government is funding advanced research to help make computers safer and those benefits are much broader than one particular application in the government. They're useful to the general public at large. And that's really what's motivating research on this project here at Galois as well because that helps to ensure trust in critical systems. Computers are critical systems they run our lives now. And if someone can quietly get in and do something that is malicious to disrupt that or do something you don't want them to do they're disrupting a critical system. So we want to, we want to help people trust these critical systems.

[MUSIC: “The Confrontation,” Podington Bear, used with permission of a Creative Commons Attribution-NonCommercial License]

ROBERTSON: To increase their chances of solving this problem, DARPA is funding several teams across the country for a program called Transparent Computing. The team that Galois has assembled includes people from Oregon State, University of Edinburgh and a company called Synaptiq.

WRIGHT: So we partner with other universities and sometimes other companies and everyone contributes toward their expertise. And collectively we pull all that expertise together and try to solve a DARPA hard problem. DARPA likes to talk about their work as being DARPA hard. So it's in a sense it's a Hail Mary pass thrown all the way down field to try to not make an evolutionary advancement but to make a revolutionary advancement and change the game.

[MUSIC: “Aim is True,” Podington Bear, used with permission of a Creative Commons Attribution-NonCommercial License]

ROBERTSON: So, let’s talk about what kind of hacks they are looking for. Alan starts by explaining what they are not.

FERN: So these are different than what your home security software is going to detect. You know, your Norton security software basically has teams of people behind it. Whenever a specific vulnerability is found, they'll code up a detector for that vulnerability and send it out to everybody's computer.

But what a lot of government entities are really worried about, and really any entity should be worried about is the day zero attacks that have not been seen, and in particular attacks that are often called advanced persistent threats.

So these are software attacks that, basically they sneak into your system and hang out for a while trying to do things sort of low and slow and eventually they have a goal of often exfiltrating data or you know doing some sort of damage. But they're scary because you just you have no signatures that were known previously and they hang out for a while. There's no good way to detect them. You know given the sea of other activity that happens on the computer. So this program is really aiming at trying to detect those advanced persistent threats.

ROBERTSON: OK. And so you're basically looking for something that's never been seen before.

FERN: Basically yes. And that's why it's so hard because when it comes down to it most everything that happens has never been seen before so. So the really hard thing is determining which anomalous things are actually worth bringing to the attention of a security analyst and that's really the major challenge of this program.

ROBERTSON: So, let’s talk more about what this sea of activity is that Alan is talking about and why it’s so hard to detect an advanced persistent threat.

FERN: The operating system in a computer is generating millions and billions of events in a very short period of time and these events are individual processes being spawned, processes spawn other processes, processes read files, write to files, write to the network, read from the network. All of these things are little events and our raw data is a stream of these events that are coming in. And there are billions of them. And so what we have to do is be able to process this event stream in near real time, extract information about that event stream that's going to be useful for detecting strange things that hopefully correlate with bad activity.

ROBERTSON: So, what does bad activity look like? Here’s Ryan to explain.

WRIGHT: Our running joke is that Google Chrome behaves exactly like an APT, like an advanced persistent threat because it does a lot of work in the background.

It updates itself, it communicates out to the network, it installs little sub packages, and installs updaters and system processes that watch for things and it gets into a lot of different corners of the system. And so distinguishing when you actually have an attack versus having a program that is just very functional and doing a lot of interesting things is a difficult problem.

ROBERTSON: So, as Ryan mentioned earlier, there are a lot of people working on this team who have expertise in different aspects of the problem. For example, one part of the team focuses on collecting all the information from the system as quickly as possible without slowing down the computer. What Alan and his graduate students focus on is anomaly detection – or as he said it, finding the strange things in a sea of data.

FERN: You can almost think of this as a needle in the haystack problem. You've got billions of these events coming in and a very small portion of them are going to correspond to an anomalous attack and you know the adversary is actually going to try to blend in as much as they can as well and that is another thing that makes this very difficult.

ROBERTSON: Since finding a needle in a haystack is practically impossible, the researchers have developed ways to break down the problem. It starts with developing what they call views, or as I like to think about it – windows into the data. Views are a way to organize the data into manageable pieces that can then be examined more closely. So, a view could be one type of file activity, for example, where files are being written to, or whether files are being accessed from the network.

FERN: This is not a science really coming up with these views it's actually a labor intensive it's a lot of a lot of hard work and requires a lot of system expertise to do this.

ROBERTSON: The next step is to run anomaly detection algorithms on the views that will hopefully find any strange things going on in there. The results are combined to get a ranking of any suspicious activities. Here’s when humans enter the picture. Each of those suspicious instances needs to be examined by an analyst to determine whether it is benign or an attack.

FERN: And so one of the innovative things about our system is that when the analyst looks at an anomaly that we show them he or she will tell us, ‘This was actually a benign thing. Yeah, it’s a little unusual but it's benign.’ And our anomaly detector will take that into account to adjust its notion of basically what is anomalous for benign reasons and then re-rank all the data. And by doing this we've shown that we can accelerate the time it takes to find an anomaly.

ROBERTSON: How much faster, is an empirical question that they are still working on with real datasets, but with benchmark datasets they saw performance double – meaning that true anomalies were detected twice as frequently with feedback compared to without.

What’s unusual about this research is that it is leading up to a test.

WRIGHT: We call it an adversarial engagement. So it's a rehearsal of the end game that we're trying to be able to solve. So every, every few months or periodically all the teams who are working on this DARPA program get together and every team plays their respective part. So the teams responsible for collecting data on a computer run their software and start collecting data on that computer. So they produce data live on machines that are running. Meanwhile, there's a red team who comes in and attacks each of those computers just quietly while no one's looking and while they're running and doing their own thing the Red Team sneaks in and perform some very sophisticated attacks. During that process data is streaming out to us where our job is to analyze all the benign and potentially malicious data. See if we can sort out those two and detect. ‘Yeah at this moment in time the red team performed this attack. And here's the file that they exfiltrated and copied off to their computers.’

[MUSIC: “The Confrontation,” Podington Bear, used with permission of a Creative Commons Attribution-NonCommercial License]

FERN: To some degree as a researcher it's a terrifying type of way to be evaluated. DARPA is good about this. They don't sort of rank teams and you know kick you out if you don't do well enough. It's really an effort to try to get the teams. To really be ready to engage real situation and to learn from it. But it's a very different way of doing your research and you have to think much more practically in this realm when you know this engagement is coming up.

ROBERTSON: Alan’s graduate student, Amran Siddiqui, was at the last engagement where for three days he and two others watched the data streaming in and evaluated any anomaly detected by their system.

AMRAN SIDDIQUI: It was actually very, I’d, say a very laborious process going through their data when looking into the graphs then confirm that okay, it was an attack or not. It was a very difficult process but we are happy eventually that we found a few of them at the end.

ROBERTSON: The idea is that over the four years of this project, the detection system will get better and better during these engagements. Fern has found that although the engagements can take time away from more typical research activities there is a lot to be gained.

FERN: There's really nothing like working with real data from a new problem that nobody's really touched to really identify the new problems that are going to be academically interesting. We have some other research directions that we're exploring that directly were inspired by certain failures of the previous system.

ROBERTSON: So, the idea of using expert feedback to improve the anomaly detection system will be tested for the first time in the next engagement. In the meantime, Amran has been using the datasets from the previous engagement to evaluate the feedback approach.

So what do you hope in the end will come out of this project.

FERN: I'm hoping that we have a system that's really effective. Ideally by engagement four I'd like our system to catch attacks very effectively with a small amount of user feedback. So that’s the ideal thing. What I'm hoping that will set us up for is you know the next step I see is applying really serious artificial intelligence to this. Right now our AI system doesn't really have the ability to reason about the activities in the computer the way that a cyber security analyst, a human, would. And partly that's because of just the amount of background knowledge that you would have to build into the system and then the algorithms to reason about that knowledge are just beyond what is practical in a real world system right now. But I see this work is sort of setting us up for that next step. There's going to be some limitations that we're just going to face until we can get there and this will sort of, hopefully, show us what you can do with statistical style anomalies detection techniques.

[MUSIC: “Aim is True,” Podington Bear, used with permission of a Creative Commons Attribution-NonCommercial License]

ROBERTSON: That’s it my friends, I hope you enjoyed learning something new about the ways in which artificial intelligence can be used to protect us from hackers. And how partnerships with companies like Galois helps Oregon State researchers get out there in the world of real data to make the internet a safer place.

This episode was produced and hosted by me, Rachel Robertson. Audio editing and moral support by Brian Blythe. Our intro music is “The Ether Bunny” by Eyes Closed Audio on SoundCloud and used with permission of a Creative Commons attribution license. Other music and effects in this episode were also used with appropriate licenses. You can find the links on our website. Additional sounds were created by Duncan Robertson. It’s possible that I talked him into this.  For more episodes, visit or subscribe by searching “Engineering Out Loud” on your favorite podcast app.


Featured Researchers