cloud-security-automated-incident-response

Cloud Security and Automated Incident Response

Fixing stuff isn’t as interesting as breaking stuff, and this even applies in the realm of the cloud. Perhaps it’s hardwired into human nature, but a headline that reads, “Global Corp Hacked; Calamity and Pestilence Ensues,” plays way better than, “Admin Updates S3 Bucket; Remediates Security Risk.” The fact is, when it comes to securing your company’s data, you don’t want to be a headline. If you create and operate with automated and codified incident response best practices, your organization will avoid most issues from getting out of control, and you’ll be able to fix them before damage is done. It ain’t sexy, but trust me, they’re not paying you to be sexy.

The keys to incident response are speed and process. It’s also important to recognize two potential things can happen: the first possibility is that you can identify a problem and fix it before any damage occurs; this is obviously the desired outcome. But the other possibility is that a bad actor has found a hole, and because you haven’t applied any automation to the practice of risk mitigation, you’re just following him around trying to pinpoint where the damage is, and has been. In both cases, it is critical to remediate, posthaste. Hackers put serious energy into their work, but theirs is a numbers game. If they see servers going offline or new requests for passwords to assets they’ve already accessed, they’ll presume you’re no longer an easy mark and will likely move on. Damage may have been done, but you will have essentially put out the fire.

While speed is critical, it has to be paired with a well-tested plan for response and remediation. These two things, speed and process, are facilitated by automation. It gives each contributor a prescription for how to apply her/his skills and an integrated checklist and according to a priorities list that can be carried out in an orderly, scripted way. Certainly, your crack staff needs the freedom to improvise as needed, but adhering to a disciplined framework will not only save time, but it is proven to be the best way to get in front of a security issue.

There are two ways of responding to security risks in your AWS environment; one uses automation while the other is functional and treats each risk differently. Both hope to achieve the same goal, but unless you’ve automated your response and conditioned your team to a specific protocol, you’ll never be able to mitigate the damage that a hacker can do. To make the case for automated incident response, let’s contrast two scenarios: first, one with a functional, non-automated response, and the other that is automated.

Functional, non-automated remediation
Just to be clear, this scenario usually starts after you’ve been hacked. Again…AFTER you’ve been hacked. So after you freak out, the scrambling begins. It also presumes you are not using a continuous cloud monitoring solution to get visibility into your cloud, because if you did, you hopefully shouldn’t experience this scenario.

So, imagine that while you’re sleeping, a hacker is scraping GitHub for your API keys, and lo and behold, the keys are discoverable. Since hackers are constantly poking at GitHub and other repositories, this bad actor now has access into your internal systems. And he got them in the middle of the night. Since you don’t have any way of being alerted to risk issues, you don’t find out until after your second cup of coffee the next morning.

At this point,  you likely don’t know where the damage is or how deep it’s gone, so you perform a “scream test” – sequester your own  servers, one by one, until someone screams. Hopefully you then get closer to the infected source, and if you find it, you sequester it, snapshot it, but keep it offline. The good news is you’ve found where the breach is. The bad news is that you now have to discover what damage has been done and if more is being done.

Because you don’t have a monitoring tool to give visibility into your cloud security situation, your only choice is to go into private investigator mode and look for clues. You have to ask around, check your SumoLogic, CloudTrail, or some other management tool that might offer some insight. Meanwhile, you log a ticket, wait for an answer, snapshot different buckets and try them in different environments, have a lot of team meetings to assign tasks and get status updates, and all the while, hope no more damage is being done. Oh, and we haven’t even gotten into the meetings required with legal, partners, customers, in addition to GitHub code audits, penetration testing, and you know, the fact that the media might have been alerted to this because your site’s performance is so slow or you’ve had to shut down your service. It’s just generally a nightmare of loose ends, and ultimately it will be a costly one.

There’s a better way.

Automated remediation response
The alternative begins with near real-time and always-on assessment of the security state of your cloud because you’ve invested in continuous cloud monitoring. So, right off the bat, you have visibility and are alerted immediately to issues.

If there’s a misconfigured S3 bucket, you’ll know about it immediately through an alert delivered through automated alerts set up through Pagerduty, Slack, HipChat, or Splunk. If the issue is determined to be a problem, an AWS Lambda response is immediately initiated, which kicks into a “snap & destroy” mode. That means the infected asset is now out of commission (because your solution identifies it), the issue has been isolated, and that all took, maybe, five minutes.

To be sure, this can be done whether or not an attack has been attempted. You are working to secure your environment, not respond after the fact.

The right way
There are thousands of wrong ways to fix a problem, but usually one best way to solve it. Pursuing wrong ways, especially when your business is on the line, can be a killer in terms of revenue and credibility loss. Those that apply rigorous adherence to security best practices and use automated incident response, all on top of a continuous monitoring solution generally have far better outcomes and sleep better as well.