The age of Big Data and cloud computing has created greater demand for advanced cryptography. To store data on the cloud safely it must be encrypted; in order to use the data it needs to be decrypted, at which point it becomes vulnerable to attacks. Professors Mike Rosulek and Attila Yavuz are finding new ways to perform operations on encrypted data without leaking critical information. Professor Glencora Borradaile teaches workshops and a new class on personal computer security to empower people to protect their own data. With community organizer Michele Charrete, she explains why people should be concerned about security and how to use tools encrypting email and anonymous web browsing.
Transcript
[MUSIC: Eyes Closed Audio, The Ether Bunny, used with permissions of a Creative Commons license.]
NARRATOR: From the College of Engineering at Oregon State University, this is Engineering Out Loud.
MOVIE CLIP (The Imitation Game): What did you do during the war? I worked in a radio factory. What did you really do during the war? Are you paying attention?
RACHEL ROBERTSON: Well, I don’t know about you, but I’m paying attention. Who knew computer science could be so interesting that they’d make an entire movie about it. That clip is from the Imitation Game. It’s about about how cryptography played a pivotal role in World War II. More than 70 years later, as our dependence on the internet and cloud computing escalates, cryptography is no less important. I’m Rachel Robertson from the College of Engineering at Oregon State University and today on Engineering Out Loud we will be talking to some researchers at Oregon State who are passionate about helping us protect our data. The hero in the imitation game was Alan Turing, a British mathematician who is known as the founder of computer science. He was pretty passionate about his work too.
MOVIE CLIP (The Imitation Game): It’s beautiful. It’s the greatest encryption device in history. The German’s use it for all major communications. Everyone thinks Enigma is unbreakable.Let me try and we’ll know for sure.
ROBERTSON: Did you see that movie?
MIKE ROSULEK: I did, yes.
ROBERTSON: Uh-huh, and so in the movie, as you know, they make cryptography seem really exciting and mysterious, you know it’s an exciting …
ROSULEK: Which it is.
ROBERTSON: (laughs): Well, that was my question: Is cryptography really as mysterious and exciting as it’s depicted in the Imitation Game?
ROSULEK: I don’t know if it’s mysterious. I say that as someone who… I think I understand cryptography, so maybe it is not so mysterious to me. The stuff that I do …it’s not me fighting Nazis or anything quite that exciting. But it is exciting and it’s interesting and we can do things that are really surprising, things that people don’t think are possible and that’s the part of cryptography that is exciting for me. Not that we are out there fighting a bad guy, but that we can do amazing things.
ROBERTSON: That was Mike Rosulek who sat down to talk to me about his research.
ROSULEK: I’m an assistant professor in computer science at Oregon State and my research is in cryptography and security.
ROBERTSON: If you have ever solved cryptogram puzzle you already know one method of encryption called a substitution cipher, in which each letter is replaced with another letter or number. Another type of encryption is a permutation cipher in which the position of the letters are rearranged.
ROSULEK: Pretty much all of cryptography is built on those small low level operations of scrambling the data. And I guess scrambling is not so hard, but what the hard part is scrambling so that it can be unscrambled in the end and that’s what people spend their research doing.
ROBERTSON: ROSULEK will be back later in the podcast. But first we will talk to another cryptographic researcher.
YAVUZ: My name is Attila Yavuz
ROBERTSON: So, you can see why I had him introduce himself. Attila is from Turkey. In fact he grew up very close to Troy. Which is fitting, since the Trojan horse is an iconic symbol for computer security.
YAVUZ: I work on applied cryptography and network security.
ROBERTSON: And so how does what you do, how is it an example of data science or engineering?
YAVUZ: Sure, for instance, one of the important paradigm that is coming today is Internet of Things and systems. For instance, consider about your pacemaker, smart home devices — millions of sensors embedded everywhere — smart airports. All these systems continues the 24 hour, seven days, collect data to improve their processes to improve our lives. But once these data are collected for analysis purposes they have to be stored somewhere and this is the place where that gives opportunities to the adversaries, let’s call them bad guys. They want to reach these data for marketing purpose or there could be much more evil purposes. And therefore it is our responsibility to protect these data in such a way that first, it will not fall into bad hands, and second, while we protect we do not want to disrupt the properties of data science.
ROBERTSON: So, the last part you said: While you’ll protect we don’t want to disrupt… so what do you mean by that?
YAVUZ: So, for instance, we could use traditional cryptographic techniques such as encryption to encrypt our data and store it on the servers in this way. We collect data, use traditional encryption, and store it. This would provide a reasonable amount of security, but once you do that… now, if you want to access these data to analyze — machine learning and data mining techniques on it, which are very important for data sciences. We could not do this with traditional encryption techniques. Now my job, one of the important aspects of my job, is to fill this research gap in which we both achieve privacy and security and data analytics at the same time. How this purpose can be achieved?
ROBERTSON: How do you do that? Can you explain a little bit about your research?
YAVUZ: Sure. For instance, one concrete way that we have published just the previous year is to use searchable encryption. So, as the name suggests, and sometimes people see this as a dilemma, is that once the data is encrypted, in theory, it should leak no information. And if no information is leaked, how could you search on it? So the objective of my research, which is actually jointly performed with Robert Bosch company, is that, while we encrypt data we create such a system that we leak very little amount of information to enable only search functionality but no other information is being leaked. But in this case, performance of our search, accuracy of our search and the information that is being leaked poses an important trade-off. So art of searchable encryption and science of searchable encryption is to identify correct trade-off for correct application and satisfy the need of government, industry, and academia at the same time.
ROBERTSON: Hmm… so how do you do that? It’s kind of hard to understand, right?
YAVUZ: Sure. Let me give you a simpler example. Consider a simple array which has five elements. Let’s say it will say, Rachel: file one, Attila: file two. We have five of those. What I will do is I will mask Rachel with random numbers, I will replace “Rachel” with just random numbers, and encrypt corresponding e-mail with traditional encryption. Now both random numbers and encrypted file doesn't leak any information to adversary. Right?
ROBERTSON: Right.
YAVUZ: And we have five of those. I will put them to the cloud, they will stay there. Adversary can see, cloud can see, but they will not make sense out of it. Let's say tomorrow I want to look up for Rachel. I know which random number corresponds to Rachel, right? I will send this to the cloud and the cloud will search these five items that we put there. And one of them has a corresponding encrypted file for that random number. That is the information that is associated with you. Now cloud doesn't know we look for Rachel because it's just a random number in the file corresponding is encrypted so he doesn't know the content of the e-mail. And he will and send the encrypted file to us, we will decrypt and know what e-mail you get. No information is leaked during this process.
ROBERTSON: Okay, so basically you are encrypting the search as well as
YAVUZ: the files. Search functionality and files are encrypted simultaneously.
ROBERTSON: So, are they encrypted in different ways?
YAVUZ: Yes.
ROBERTSON: Okay.
YAVUZ: You do the same operation again and again with the different keys at the end of the day results one not make sense to somebody who doesn't possess the key.
ROBERTSON: Okay. So, it’s actually like you are scrambling it several different times.
YAVUZ: Yes, and several different ways. And actually, how would you know what you encrypt is secure? So, there is a simple example for that I will give you two files. One of them is totally random, totally random data I produce. On the other hand, I will have another file and I will use my encryption technique. And I will give both of them to you and I will ask which one is random data and which one is encrypted data. Now if my encryption algorithm is good enough the probability that you would distinguish between real encryption versus random data should be negligible. Basically encryption renders data to random data so that entropy of the data becomes maximal that's how we analyze security of encryption systems.
ROBERTSON: Oh, okay. So if it's indistinguishable from being random then you've done your job.
YAVUZ: Yes, and to achieve search purposes we have to leak a little bit more information than that. Otherwise we could never find who Rachel is or who Attila is, but not much. And that is where search art of searchable encryption starts: how to formally quantify this leakage, what are the performance requirements, what are the needs of the people and application — all that applies to cryptographic research.
ROBERTSON: So, now we will return to ROSULEK. And although Atilla and ROSULEK both work in the area of cryptography, their research areas are different, so I asked ROSULEK to explain how they differ.
ROSULEK: Okay, well searchable encryption is about performing a specific task and performing it as fast as you can and that task is a search on encrypted data, just a keyword search, or a lookup and my work is about more general computation. So, imagine taking any computation that you care about, for example a machine learning classifying algorithm, and you want to run that on encrypted data. And my research is about how to do that in full generality while still being secure and as efficient as possible.
ROBERTSON: To explain the kinds of computations his research aims to improve, Mike tells of a game his family plays called the “long-lost-relative game.”
ROSULEK: So when my mom meets someone for the first time she finds out where they live and she says, “oh, I know someone from there, do you know this person?” “No, do you know this person?” And eventually, without fail, no matter who the new person is, she always finds someone that has a common acquaintance to both her and the new person.
ROBERTSON: And so how is this related to cryptography?
ROSULEK: So, I like to think of this game as a computation that takes place between two people. And each person has a list of the acquaintances that they know and the goal of the game is to find someone who is on both lists. So, a person that both people know in common. And you can do this just by systematically going through your list saying, “Do you know this person? Do you know this person? Do you know this person?” But I guess that’s not something you do with someone you’ve just met for the first time. So as a cryptographer I think of whether there are ways to do this without revealing your entire list of acquaintances to the other person. And, in fact, using the tools that come out of my research this is something you can do. You can identify commonalities to two different lists without revealing those lists. And this has applications in business, at least. But you can imagine this companies want to get together and figure out what are their common customers or maybe just how many customers they have in common. And using tools from cryptography they can do this without revealing the content of their customer lists.
ROBERTSON: One method that Mike has been working on to achieve secure computation is a cryptographic method called garbled circuits. I admit, I did not get what it was at first.
ROBERTSON (recorded): When I heard garbled circuit, I’m thinking, ‘Okay, it’s like a physical thing.’ But it’s not.
ROSULEK: Yeah, yeah circuits doesn’t refer to an electrical engineer circuit, it refers to like a mathematicians circuit, which is just a series of operations on bits. So, that’s where it gets the name, but the word garbled is pretty cool, I think you have to admit.
ROBERTSON: What’s even cooler is what Mike and his colleagues have been able to do with garbled circuits. In two recent publications they have demonstrated that, in comparison to other methods, their new algorithms were 33% faster and had 33% less overhead in the amount of communication required for the computations.
ROSULEK: So, not only is our new garbled circuit construction the fastest and the smallest but we proved that ours is actually optimal which means you won’t be able to come up with anything faster or smaller than ours unless you come up with some really novel approach that no one has thought of before.
ROBERTSON: Why are the advances you are making important?
ROSULEK: Justify yourself, Mike!
ROBERTSON: No pressure.
ROSULEK: Well, if you look at the big picture. This idea that you could compute on encrypted data has been around since the 80s, at least as a theoretical idea. And no one thought to do anything with it because it seemed like it would… the costs would be astronomical. But then in the early 2000s, people decided, ‘well, let’s implement this and see whether it’s actually fast or not.’ And then people started getting interested in making these tools fast and getting them to apply to the real world. And we’ve made these tools thousands and thousands of times faster over the past 20 years, which is really exciting to the point where it is now practical. You can use it for many different things. I can’t say that there are thousands of applications of it in the real world but there are applications in the real world, and people are using these tools to do sensitive computations. And I think the more we speed things up the easier it will be for people to just choose the secure way of doing things, rather than choosing an insecure way, just because it’s a little bit faster. I think if we close that gap it will make the cost of security less of a barrier, which is good for everyone whose data is out there and being used for these things.
ROBERTSON: So, although the work of Attila and Mike may not be as dramatic as the story of Alan Turing, their research in cryptography is helping to protect our data. So, I think they can be considered the heroes of this story. In part two of the podcast we will be talking about what we as individuals can do to protect our data and why we should be doing it. Get ready for another movie clip because that seems to be a thing I do.
MOVIE CLIP (Snowden): How is this all possible? Think of it as a Google search except instead of searching on what only people make public, we are also looking at everything they don’t: emails, chats, SMS, whatever. Yeah, but which people? The whole kingdom, Snow White. The NSA is really tracking every cell phone in the world.
ROBERTSON: Any guesses on what that that movie is? It’s based on a true story, and the reference to Snow White was a clue. The movie is Snowden. It’s about the events that led Edward Snowden to leak documents that revealed global surveillance programs that the United States government was involved with. In Part 2 of this episode on computer security we’ll be switching focus to how we can protect our data. I’ll be talking to Glencora Borradaille, an associate professor of computer science in the College of Engineering at Oregon State. Cora was motivated by the Snowden revelations to teach people about the importance of computer security.
BORRADAILLE: All of a sudden, there is news story after news story based on Snowden’s leaked documents of all these different capabilities that the NSA has built up over the last decade. Which basically seemed to mean that anything you send online, if you are not explicitly protecting it, using cryptography, is available to the NSA and this was worldwide thing and there were partnerships between the US, the UK, Canada, New Zealand and Australia. So, that kind of woke me up and got me thinking about, ‘Well, what can you do about this?’
ROBERTSON: What she decided to do is use her knowledge of computer science to help educate people about how the internet works and how they can keep their personal data private. It has nothing to do with her computer science research on graph theory, but she was motivated for other reasons. Cora is also involved in social movements such as the environmental movement to halt the use of fossil fuels, and she realized that global surveillance could have an impact on social change.
BORRADAILLE: There’s a lot of people that I talk to who weren’t willing to talk about certain things over the phone or by email because they felt nervous about talking about their beliefs. It had very tangible chilling effects on what they were willing to talk about and so that worried me and I wanted to do something about it and not get stressed out about it.
ROBERTSON: Cora's partner in life is also her partner in this mission to educate people about securing their personal data.
CHARRETE: I’m Michele Charrete.
ROBERTSON: Michele is a molecular biologist turned community organizer.
CHARRETE: I do most of my organizing through a group called Rising Tide which works on what is called climate justice.
ROBERTSON: Together they have been teaching workshops on personal security in which they show people how to encrypt their email and Cora has also developed a course at Oregon State called Communication Security and Social Movements. They make the point that it’s not just people involved in social movements who should be concerned about personal security.
CHARRETE: You know, the nature of the internet is you have all of this data, and all of this data can be captured, and recorded and stored basically indefinitely and that’s one thing that people don’t really always remember is that everything that you send across the internet that is unencrypted — it is actually being actively recorded and stored. Data storage gets cheaper every year and so there is basically no penalty to keeping all of this email forever.
BORRADAILLE: But there is also … I would hope to encourage people to think of the sense of solidarity that they could have with social movements that they do care about, right? Like journalists exposing wrongs worldwide or dissidents in, you know, not necessarily in the United States but worldwide who rely on these tools to do their work they rely on email encryption, they rely on anonymous web browsing but these tools become even more robust, if more people are using them, right? If lots of people are using email encryption then it’s harder to figure out who the people are that are using email encryption for political means. So if you have like thousands of people using email encryption and only 10 of them are using it for political means then …um…
CHARRETE:…mass surveillance becomes impossible because you can’t break all the encryption all the time.
ROBERTSON: Okay, so, now to the nitty-gritty. How do we do it? First Cora explains how unencrypted email can be captured.
BORRADAILLE: If I want to send an email from my Gmail account to your Yahoo account what would happen is the message would get sent from my computer, through the internet to Google’s Gmail servers, and then from the Gmail servers, through the internet to the Yahoo servers, then from the Yahoo servers through the internet to you. And if you are reading your email in kind of a standard way then that email message will be encrypted between myself and Gmail and between Yahoo and you, but between Gmail and Yahoo it will be completely in the open, and sent like a postcard. So now if you encrypt your email you would use public-key cryptography which means that you basically publish online a way for me to send you an encrypted email and I could look that up. I would encrypt my email to you using this technique and no one can read that email after I’ve encrypted that email, after I’ve locked that email only you will be able to unlock that email and read it. So now when it gets sent through the network, anybody who is watching the cables between Gmail and Yahoo would just see an encrypted email passing through. But now for the email to get from me to you, there has to be some information in the email that says it’s coming from me and it’s going to you. So that information is still public and someone watching the internet connections between Gmail and Yahoo would still know that I’m sending an email to you but they wouldn’t know the contents of the email.
ROBERTSON: To learn how set up public-key encryption you can go to ssd.eff.org, or search surveillance self-defense. This will be in the show notes if you don’t want to write it down right now. In addition to public-key encryption, Cora and Michele suggest anonymous web browsing. What I learned from them is that the private browsing function on browsers like Chrome and Explorer are not strictly private. It keeps the sites you are visiting from being recorded onto your computer, but the information is still available on the internet, for example your internet provider, such as Comcast has all of the information about what sites you have visited.
CHARRETE: Anonymous web browsing can be achieved basically by downloading the Tor Browser and using that. And it takes care of everything. And, it’s very, very simple.
ROBERTSON: Tor is spelled T – O – R. As simple as it is, most people do not use it, thinking that it’s only for people who need to hide something. CHARRETE admits he did not see the importance of it at first.
CHARRETE: And what really convinced me actually was a friend of mine who worked with the Clinton Foundation in Uganda and he basically said, you know what, when I read the New York Times, when I read news or whatever, I just use the Tor Browser because I know journalists back in Uganda that relied on this to stay safe from the government. And after he gave me that reason, I’m like, yeah, okay I’ll go ahead and use it because even if I don’t need to protect myself I know for sure there are corrupt governments around the world, journalists risking their lives. If I can protect those people, I know for sure that that is a good thing.
ROBERTSON: It is indeed a good thing. I hope this episode shed some light on the somewhat esoteric topic of cryptography and why it’s important. Thanks for joining us on Engineering Out Loud produced by the College of Engineering at Oregon State University. This episode was produced by me, Rachel Robertson, with additional editing by Mitch Lea. Our intro music is The Ether Bunny by Eyes Closed Audio on SoundCloud and used with permission via a Creative Commons 3.0 license. For more episodes visit engineeringoutloud.oregonstate.edu.