Over the years I have helped many organizations implement logging solutions. For better or for worse, a security incident of some sort is the event that tends to drive change in an organization; often, it is an external attack, or perhaps an internal HR related matter that would benefit from the sort of historical evidence that logs can supply. Many organizations have limited or no logging infrastructure in place prior to the incident, and it is only when the requirement for historical evidence or real time information arises, that senior management realizes that there is a big piece missing
from their security jigsaw puzzle. For some this task can seem daunting. “We have so many systems. Where do we start?”, is a common question. The task can appear insurmountable when you are faced with tens or hundreds or thousands of systems, but roll-out plans can always be broken down in to bite size chunks.
One of the best places to start is with your critical systems. Most sysadmins will intuitively know the key systems, services and network architecture in an organization. Even when the administrator has no need to understand the data that passes through such systems, they will invariably have a dynamically updating mental map of the parts of the IT infrastructure that will result in frustrated users, and lost business, if they go down. Some basic examples are:
- Servers. Physical or virtual systems that host mission critical applications or services onWindows, Linux or another server-level operating system.
- Desktops or endpoints where the user interacts with the backend applications. Typically these are Windows based systems but there is a growing community of Mac OSX and Linux based users in businesses.
- Databases. Databases often contain the bulk of an organizations sensitive data, and are usually the first priority to track “who, what and when” from a data-access perspective. The integrity of the data also needs to be assured by management and regulatory authorities.
- Applications. Applications are the interface layer between users and data, and often generate useful logs that assist in analysis and forensic investigation. These can be web server logs, proxy logs, AV logs, or even logs generated by custom applications. For organizations that favor a ‘bring your own device’ strategy, or use ultra-thin-clients such as laptops, the application may be the closest logging source to the client.
- Network. Firewalls, routers, switches, Wireless APs, IDS and IPS systems – each of these can generate vast amounts of data that can be of use individually, or can be correlated with other system log data.
Each of these sources can generate mountains of data. Although vendors tend to make each individual log entry reasonably human-readable, when you are faced with trying to read an encyclopaedia every second of the day, every day of the year, having some level of centralized analysis engine is of great benefit. These systems are known as ‘SIEM’ servers (“Security Incident and Event Management”). In our case, we have a product called “Snare Central”, and its capabilities address goals such as:
- Store the logs away from the system that generates them, in order to reduce the likelihood of tampering or deletion. In many instances if a system has been hacked the intruder will clean up the local system logs in an effort to hide the activity. If the logs are stored securely away from the system that generated them then more forensics data will be available for review.
- Keep the logs secure so only staff that have a need to know have access to the logs. Many logs can contain sensitive information or can reveal usage patterns on systems. Credit card numbers, for example, are often leaked to local system logs by overenthusiastic applications.
- Perform regular reporting and analytics on the logs to analyse usage patterns, threats and compliance activity.
- Help with incident management. One of the key aspects of a centralized logging solution is assisting with incident management for a security breach in showing the “who, what, when and how” that relate to the incident.
- A SIEM can run on a virtualized system, dedicated hardware or in the cloud. Some of this will depend on the capacity required for the log storage and other factors such as Event Per Second (EPS) rates, and the security posture of the organization where the logs can live given the nature and sensitivity of the data.
- Most compliance standards require that logs be kept for a reasonable period of time. The mean time to detection for a security breach is measured in months, so having logs available for a reasonable period of time contributes to successful incident analysis. PCI DSS requires that logs be kept for at least 1 year, with 3 months readily available. Other regulations can require longer storage periods, and some companies and government agencies have their own data retention requirements. A SIEM therefore needs to have a data retention capability commensurate with the log retention need of the organization. In the end this usually comes down to planning an optimal disk allocation based on a combination of the number of logging systems, expected activity volumes. It’s rare to get this right first go but as long as you have the ability to grow the system to use more storage space, you are generally covered.
- It allows the security teams to get actionable intelligence of what is going on, in the organizational IT infrastructure.
Once these strategic goals are understood, a rollout plan will usually lead to changes to your infrastructure components in order to generate and direct log traffic to your SIEM solution. Network and infrastructure equipment such as firewalls, authentication gateways, or switches will generally implement the ‘syslog’ protocol. Turning on syslog logging, and pointing the device to the SIEM collection service is usually a very quick and easy way to start to collect logs. In the case of the Snare Central Server, no configuration changes are required at the collection end – data will start rolling in, and it will be stored, categorised and be available for reporting.
Servers and desktops will usually entail collecting the logs via one of two methods:
- Installing an agent or
- Activating an agentless collection
The pros and cons of agent-based vs agentless solutions are covered in another whitepaper. For ease of use, and scalability, my preferred and recommended method for most organizations, is the agent-based solution. The process of installing Snare agents is usually quick and painless, and they provide a sane default configuration that will meet the needs of many small to medium environments out of the box. Once Snare agents are installed, they can be configured to send logs to your SIEM server, and you are up and going. For large environments deploying an agent will usually involve using one of the following:
- Configure and deploy the agent using an MSI with a template configuration. Sometimes the security or admin staff will want to review what logs are being collected and adjust the standard install and the objective filtering needs. Destination server information can be changed, or other options can be changed to enable USB tracking, monitor file activity, watch for registry changes, exclude noisy events, and so on. These extra settings can often be driven by compliance needs for security standards or regulations such as PCI DSS, ISO 27001, SOX, or HIPAA, where specific logs need to be collected and reviewed on a regular basis.
- Using tools such as Microsoft GPO, SCCM, or IBM Big fix, that can handle the remote authentication and installation of the software. Most companies have something in place to push out applications and updates so any of these can be leveraged.
So we now have senior management support for logging, an infrastructure that is capable of sending us data, and a central server to collect, store, analyse and archive our log data. We still need to know: what sorts of logs should we collect on a modern network? The answer to this question varies from customer to customer, often substantially; but there are some basic guiding rules of what to collect:
- Login/logoff events – know when users are using the system and from which source. Should a user be logging from Singapore when they are in New York, why are they logging in during the middle of the night?
- All administrative activity – to track all system changes performed by administrators. Was this approved or authorized activity or should it be considered a security incident and subject to follow-on analysis? All administrators have the ability to override technical controls either at the operating system or in a database. If the administrators credentials are compromised, then the account can usually perform high level modifications to network and system infrastructure, including stealing data or changing database contents; all of which can affect the operation of the organization significantly.
- Account changes – password resets, group membership changes. Why was a user granted domain admin, not long after their password was reset? Was this a breach?
- Track commands that are run on systems. Are these white listed applications or are staff using unauthorized applications? Maybe it’s some malware that the AV does not know about exploiting the system or using Power-Shell commands on windows. Was it linked to a web link someone clicked on from an email Phishing attack that resulted in payload being run on the system that bypassed the AV controls. Was it something like Rubber Ducky where it was a USB device emulating a keyboard that created its own power-shell script to perform some malicious activity. Tracking commands can highlight many potential problems.
- File auditing and activity monitoring. Are users performing authorized changes to files or accessing files they should not be. File auditing can highlight problems with access controls to sensitive data not being set correctly, and can help detect the abuse of access privileges.
- Software installation or removal. Are tools or other software being used on the network with out permission? This could be from a backdoor from an application or maybe a malicious staff member installing their own hacking tools to perform unauthorised activity.
- Tracking removable media. This can lead to data loss as a result of staff copying sensitive data to removable media such as USB or DVD, before removing it from the organizations network. It can also be a source of viral or worm infection in much the same way WannaCry and its variants infected many companies.
- Networking logs. The value of firewalls and tracking what goes in and out of a network is sometimes not well understood. Firewall logs can highlight applications and users attempting to access sites or services that could pose a risk to the organization. Many firewall reviews uncover unauthorized applications installed on systems when they attempt to “phone home” to get updates or exfiltrate data from the organization. Seeing what ports are being used on switches can alert to unauthorized devices being connected to the network. I have had customers identify cleaners accessing the network and it was found by the switch port logs.
So overall, this sort of monitoring allows the security teams to receive actionable intelligence on threats and incidents from the organizational IT infrastructure, which leads to improvements to security controls and operations through threat mitigation. This should improve the environment with a reduction and frequency these threat activities occurring to the business.