A large part of Carve’s customer base is software development organizations, for whom the source code represents the “crown jewels,” the intellectual property whose compromise would be a nightmare for the business. Protecting the code is paramount to these organizations’ security.
Over the past year, the number of attacks targeting code hosted on the Internet (e.g. Github) has been on the rise. These attacks have taken on different shapes – from “watering hole” attacks such as the attack on CodeCov to the GitHub oAuth tokens compromise via Heroku and TravisCI that hit the news recently.
Oftentimes, these attacks go undetected for a long time. For example, it appears that the CodeCov compromise went undetected for two months. When they are detected, however, being able to properly respond to these types of incidents is extremely important. A top priority when responding to unauthorized source code access is understanding who accessed what areas of code, and when they committed changes.
Now, how would you know who accessed what from where (and if that’s legitimate) if you’re staring at two months’ worth of logs?
To start answering this question, we must identify all accesses to the code that we know have to be “good.” By excluding them from the superset of all events, we would reduce the extent of the suspicious activity that needs to be investigated. To accomplish this, we’d want to enumerate as many possible locations that code would normally be accessed from (below I’m assuming an Internet-based code hosting solution such as GitHub), and ensure that we have the proper logging mechanisms covering those locations…