In Cybersecurity, to Err Is Human, But to Err for No Reason Is a Shame

Ranjit Bhaskar
Author: Ranjit Bhaskar
Date Published: 19 October 2023

To paraphrase Sun Tzu in The Art of War, making as few mistakes as possible raises the odds of conquering an enemy that is already defeated, even before the physical battle commences.

Designing systems and interfaces with human factor engineering in mind means that people make fewer mistakes. And in the current world of overworked, overextended cybersecurity teams, fewer mistakes made by employees, customers and partners might make the difference between holding off the enemy and succumbing to sustained attack. Consider some examples.

Document Intake Processes

I was involved in the assessment of a 3,000% increase in malware alerts on an organization’s network over the course of a few months. It turns out the organization had redesigned their document intake process from customers by listing an email address on the corporate website instead of requiring them to log into a customer portal for document uploads.

Drive-by submissions of malware-laden documents by automated bots and bad actors increased exponentially as a result of lowering the barriers to sending documents deep into the corporate network. Most malware operators are opportunistic, and they do not script out a portal registration and subsequent approval verification, and an email address on a corporate website inviting customers to send in documents is an enticing target for automated bots looking for easy internet targets.

Data Sanitization

An organization I worked with was investigating sensitive customer data loss alerts appearing on several organization network locations on a regular basis. Investigation soon pinpointed a likely suspect: the organization’s vast team of developers, quality assurance personnel and business analysts who all had legitimate reasons for obtaining copies of production databases for testing purposes. The documented sanitizing methods for customers’ sensitive and personal data were unreliable because they were applied inadequately or not at all by the large teams. The organization switched to an automated scrubbing framework where pristine production backups were secured/isolated/encrypted at rest while the backups made available to everyone were already sanitized and anonymized.

Phishing Alerts

An organization with a sophisticated cyberawareness program noticed a gradual increase in their 4,000-plus employees clicking on obvious phishing emails that purported to come from the organization’s help desk. A young engineer tasked with tracking the root cause noticed that the yellow banner at the top of the email declaring it to be an external email was being ignored. Suspecting that people were used to the banner and no longer saw it as a warning, the engineer started changing the color of the banner periodically and thereby successfully nudged the click percentages down.

Credentialing

A security operations team tasked with monitoring risky logins from unusual locations and password spraying attacks was confused and inundated by the extraordinarily large number of bad credential errors plaguing an organization's core customer systems after a recent upgrade. The organization had assigned two sets of credentials: one for the newer system and the existing one for the system it was eventually supposed to replace. The links to both systems landed the user on a single sign-on login page that looked identical, no matter the system. The team found that users who followed links were momentarily confused as to which set of credentials to use with which systems while on the identical sign-on page. A tiny adjustment to the login page to show additional context related to the link the users followed and the type of system they were trying to login to reduced credential errors by 60%, freeing up valuable bandwidth for the security monitoring and events teams.

Logging

Six different application development teams worked together, churning out gigabytes of application logs every day in their production environment. The problem was the relentless growth of the logs and the inherent risk in retaining large amounts of potentially sensitive processing data. The organization was hesitant to enforce retention on the logs because some logs needed to be kept longer for regulatory reasons. Best practices for logging, peer reviews and quality assurance checklists/automation helped to some degree, but the organization discovered that inconsistencies in logging practices were due to complex logging rules in the development practices. The organization simplified the rules by letting the developers choose between just two top-level logging categories in their centralized logging platform: A 30-day temporary processing logging folder and a 5-year logging folder that met regulatory standards for retention as outlined in the organizational policies. Then the organization attached retention bots to both folders. The result was a massive reduction in hard disk space utilization and data exposure, as well as cost savings.

Onboarding

Investigating an error-prone onboarding/offboarding process at an organization, I learned they had no less than 26 different user repositories for various applications. Help desk and user administrator personnel had learned that no matter how much training, automation or checklists they put in place, offboarding personnel from the organization was leaky because they were human, and properly terminating personnel from 26 different user repositories was too cumbersome to be done consistently without error. 

The organization decided to shrink the number of user repositories to just two: An active directory system for internal users and a Lightweight Directory Access Protocol (LDAP) for external users. All the applications were then put behind a Security Assertion Markup Language (SAML)-based identity provider with single sign-on capabilities. End users loved not having to remember 26 credentials, but the major benefit was that errors while offboarding users were reduced or eliminated because the administrators only needed to deal with two user repositories instead of 26.

Give Employees and Customers a Break

It is time to give beleaguered customers and employees a break. Organizations must put more thought into designing systems and processes with human factor engineering in mind so people are less likely to make mistakes and cause cybersecurity incidents. It is important to understand the psychology of why humans make mistakes and incorporate that knowledge into even the most mundane interface. Other industries, especially aviation, have had considerable success with simple changes to the shape and relative location of control levers to keep humans relying on muscle memory from making mistakes. Instead of being reactive post-incident or just relying on training to keep humans from making mistakes, organizations should study what components of their interface are leading them to make mistakes in the first place.

Editor’s note: For further insights on this topic, read Ranjit Bhaskar’s recent Journal article, “The Role of Human Factors Engineering in Cybersecurity,” ISACA Journal, volume 4, 2023.

Additional resources