Security breaches and service interruptions have been abundant in the news over the past year. As the leader of your growing technology organization, you need to take steps early to ensure that your customer data is safe and service continuity is maintained. Leakage of sensitive data or a site outage due to an attack can quickly become an overwhelming issue. In this post, I’ll outline a set of activities that should encompass a basic security program. This is geared towards a small to mid-sized team (less than 30 engineers), supporting an application delivered over the Internet.
If you store or process credit card information through your application, you may already be familiar with PCI standards. The PCI Security Standards Council is a body of companies from the payments industry. They maintain a standard, called the PCI Data Security Standard (PCI DSS), which details security related requirements in order to achieve compliance for processing credit card data. At Zoosk, we maintained a PCI level 2 compliance. I found the PCI DSS to be a useful reference for structuring a security program. Whether or not you have to be PCI compliant, the PCI DSS details many industry best practices to secure sensitive data. In the event you are breached, if you followed PCI DSS, it is fair to claim that you implemented reasonable security practices. I recommend reading through the spec. Where you see references to cardholder data, simply replace that with what you consider to be your set of sensitive customer data.
If you aren’t sure what customer data would be considered sensitive, it’s best to be inclusive. The formal definition of personal identifiable data is determined by each state, as part of their security breach notification laws. California provides a pretty encompassing definition:
(e) “Personal information” means any information that identifies, relates to, describes, or is capable of being associated with, a particular individual, including, but not limited to, his or her name, signature, social security number, physical characteristics or description, address, telephone number, passport number, driver’s license or state identification card number, insurance policy number, education, employment, employment history, bank account number, credit card number, debit card number, or any other financial information, medical information, or health insurance information. “Personal information” does not include publicly available information that is lawfully made available to the general public from federal, state, or local government records.
With that introduction, here is a list of considerations for structuring your security program, loosely modeled on the requirements outlined in the PCI DSS.
First, while not delineated directly in the PCI Spec, it is important that you establish clear ownership for the security program. As the senior technology leader, you are ultimately responsible. But, you will not spend the majority of your time focused on security. In a fast-growing start-up, where the priority is on shipping product, trying to distribute ownership of security or instructing all your engineers to use “good security practices” will not work. I recommend designating an individual on the team as the owner of the security program. Ideally, this represents a full time job, in which case you are hiring a security lead. If your team is small or budget-constrained, then you should assign this role to a software or devops engineer, and ensure that they are able to dedicate significant time to this effort.
I personally think that hiring a dedicated security professional onto your team is the best guarantee that security matters are addressed. A team size of 10-20 engineers represents a good point to transition to a full-time security lead. Of course, if your product is a SAAS application in the payment processing space or stores sensitive data for other companies, then this should be your first hire.
Secure the Network
The most basic element of your security posture is to secure the perimeter of the network which hosts your production systems. This means examining all access points from the Internet to your production environment. Create a detailed list of all the IPs, ports, protocols and services which are expected to be exposed to the Internet. This exercise should also include outbound traffic. Detail IPs, ports and protocols that would be used by your applications to connect to outside services. This is important, as unusual outbound traffic is often an indicator of a compromised server (for data exfiltration, ping back to bot net, etc.).
Once you have a detailed list of allowed traffic patterns, you would want to apply these to the device that controls access into and out of your network. Usually, this is a dedicated firewall or router, that allows authorized inbound/outbound traffic and blocks everything else. If you are hosting through a PaaS or IaaS vendor, you will want to investigate what type of capabilities are available to accomplish the same function.
PCI requires that your network topology be documented, along with this list of allowed traffic patterns. Changes to firewall rules must be formally approved. Also, you should harden your actual network equipment. Limit access to the equipment, shut down unnecessary services and maintain strong passwords. No matter how much you restrict traffic to your production network, if your firewall or border router is compromised, then nothing else matters.
Once you have mapped out all of your production systems and limited network access to them, determine who should have access to your production systems. “Who” is usually defined as a set of people, but can also be automated processes. Access can represent all forms of connections, including at the operating system level (i.e. SSH) and services level (apache, mysql, ldap). It is helpful to organize your people by role (sysadmin, developer, release engineer, etc.). Then, you can generalize access controls by role.
The goal of access controls is to limit access to systems/services to only those individuals who absolutely need it. While this may sound counter to having an open and agile culture among your engineers, it dramatically reduces the risk surface area in case of compromised credentials. For example, a data analyst who only needs access to a reporting server should not have a credential that could be used to access the database. It’s not that you don’t trust your people. The problem is that anyone can accidentally fall prey to a phishing scheme or type their credentials into a compromised machine.
An access control process also includes ensuring that your users have strong passwords, which are changed frequently. Consider mandating password vaults like 1Password for individuals or a shared repository like Onelogin. Finally, define a process for granting access to new employees and shutting down credentials for departing employees. I have been surprised at past companies to see active credentials for employees who left the company long before.
Storage of Sensitive Data
Personally identifiable customer data should be stored in a manner that offers more security than other types of data. For credit card data, PCI requires the following:
- The data be stored in a one-way encrypted form.
- The key used to encrypt the data should be stored in a separate device from the database. Restrict access to the encryption key to the least number of personnel. Change the encryption key periodically.
- Minimize the amount of sensitive data kept. This primarily refers to data retention. Maintain the fewest number of back-ups necessary for operational continuity and delete the rest. There have been cases of sensitive customer data sourced from a forgotten database back-up.
- Isolate the database containing the sensitive customer data onto a separate network with a firewall access point.
Even if you don’t process credit cards, you should seriously consider taking similar approaches with the storage of your sensitive customer data.
Monitoring and Testing
Access to the database storing your sensitive data should be monitored and logged. This includes both user and service access. For each request, you should log the query that was run. This log data can be a very useful indicator of suspicious activity. For example, if a normal user pattern is to only access one record at a time, then a select query for all records in a table would be suspicious. Also, monitor and log operating system or file system changes on the servers hosting your database, as well as the application servers providing a path to it. Installation of new files, for example, can indicate an active hack on the server. Network activity should also be monitored. As mentioned previously, outgoing traffic is often the best indicator of a potential security event happening. Most system compromises will involve a call back to a server on the Internet to retrieve rootkit code, or to exfiltrate data. Monitoring should be automated with logic to distinguish suspicious activity from normal. Security events should be manually reviewed as soon as possible.
Your perimeter should be tested periodically. This takes the form of penetration testing. Basic automated scans can be run against your externally facing systems, looking for known vulnerabilities. Usually, these automated scans perform a basic set of tests for service misconfigurations or open ports. Many vendors, like McAfee, offer this service. These automated scans can be supplemented by true penetration testing, in which a white hat attempts to break into your systems. The white hat will initially run an automated scan across your IP space, followed by targeted attacks on any perceived vulnerabilities. I have used Rapid7 for this type of penetration testing in the past with great results. The output of penetration testing will be a list of possible exploits that you need to have your team address.
You should have a process in place for tracking and acting upon vulnerabilities that are announced for the software packages that you are using. An example of a type of publicized vulnerability is Heartbleed. When a vulnerability like Heartbleed is publicized, your team should be able to quickly address the vulnerability through system patches. Beyond closely monitoring security newsfeeds, you can automate this vulnerability detection with a vendor tool. Products like AlienVault USM will scan all your production systems from the inside, looking for known vulnerabilities in the software versions you are running. This differs somewhat from penetration testing, in that the focus for vulnerability management is from the inside of your network.
If a security event is reported, the incident response process kicks off. Incident response encompasses the activities taken to assess and address the security issue. Incidence response involves a designated team of individuals, who follow a formal sequence of steps. Those steps should be documented in advance in the Incident Response Plan. An Incident Response Plan can include contact information for key department heads, protocols for communications and methods for collecting evidence.
- Distributed Denial of Service (DDOS). A DDOS attack represents a condition where large amounts of network traffic are directed at your production systems in an effort to overwhelm them and take down your services. Mitigating a DDOS attack involves analyzing all incoming network traffic and filtering out the attack traffic. This can only be accomplished at scale by one of the DDOS mitigation services, like Akamai or Imperva. In these cases, you will route your network traffic through their scrubbing center. The clean traffic will then be sent back to your systems for processing. This return path is usually though a private network link. Set-up of this configuration requires several days. If your company is hit by a DDOS attack, the last thing you want is to start the process of engaging with a DDOS vendor while your site is down. Do not assume that your data center bandwidth provider or cloud vendor will handle this. Often, they will simply blackhole your IP space (route to null) to get the traffic to stop.
- Keep Perspective. After you set up your security program, you will likely surface potential security events frequently. In my experience, a high percentage of security events (over 90%) will be false positives. However, they are all very scary. So, it’s important to do the investigation work, but maintain calm until you have a full understanding of the event.