Data Classification 101

The foundation of data-centric protection is knowing what data needs what level of protection.

At the South Carolina Department of Probation, Parole and Pardon Services, officials who classify data for the agency must carefully weigh any personally identifiable information about victims, criminals and employees.

Now, David O’Berry, director of IT systems at the agency in Columbia, S.C., is working to make those classifications more fine-grained. “Defining secret and top secret isn’t the hardest part of classifying data,” he says. “The hard part is classifying that data as a living target.”

Thorough classification is the most difficult part of achieving data-centric security controls, according to reports from the National Association of State Chief Information Officers. O’Berry and other experts offer their advice for those embarking on data classification projects.

Don’t reinvent:

Look to units that have already classified business data — for example, the disaster recovery/business continuity department, recommends Mark Rutledge, the former CIO of Kentucky who’s now director of government strategies at McAfee.

“In my former role in Kentucky, it was understood that single mothers couldn’t go without their welfare checks if something happened to the data center, so off-location fail-over for those services was arranged,” Rutledge explains. “Disaster recovery plans show you the important services, then you roll that back to their critical data sets.”

O’Berry also suggests the records management division, which should be up to date on data classifications and rules for regulated data.

Treat data as a living thing:

Data classification represents the expression of business value placed on the data — value that changes during different stages of the data’s lifecycle, says Jenean Paschalidis, senior risk analyst for SAIC.

“Take a financial transaction, which has value for the person inputting the transaction, has more value when it’s being analyzed, and then no value when it’s stored,” Paschalidis says. “But even in storage, data can become of high value again — for example, in an investigation or if [personally identifiable information] leaked and had to be reported.”

Involve the business:

Data also needs to be put in context of business imperatives, which Paschalidis translates as data affecting productivity, finances and reputation. This means bringing the business in and giving them ownership over the classification process.

“Data classification is not an IT issue, it’s an issue of getting the business to express and clarify their processes so IT can fulfill them,” adds Paschalidis.

For O’Berry, this partnership with the business unit provides the additional benefit of enabling the organization to tie classification to the additional context of the user, starting with groups. He explains, “If legal accesses HR data, that’s a [data] loss no matter how you classify the data.”

Continue fine-tuning:

O’Berry plans to implement McAfee’s Host Data Loss Prevention on end-points by the end of year to monitor usage for ongoing tuning of classifications. “Say we see new types of agency business trying to go out through web mail. We can create a new classification and policy around that,” he says.

As he implements these changes, O’Berry is careful to inform end users of changes in business-loss terms that don’t sound condescending. When educating, he adds, it’s important for users to feel enabled, not as if they were doing something wrong.