You know what data loss prevention is, but you may have some lingering questions about how to prevent data loss. Data loss prevention software is not the silver bullet in the constant battle for data security. It is, however, an important arrow in your technology department’s quiver. Data loss prevention software helps your team automate much of the daily tasks that are required to keep sensitive district data secure.
Data loss prevention tools have evolved over the past several years as districts adopted cloud computing. While many DLP methods remain important, such as backing up your data and using strong passwords, securing data in the cloud is challenging in several ways. School districts that use Google Workspace or Microsoft Office 365 can no longer rely on perimeter network defenses to secure data stored, accessed, and shared in the cloud.
When you are just getting started, data categorization will likely be more of a manual process than software-driven. Much of that process depends on the amount of data you’re working with and how complicated your infrastructure is. Most of the data loss prevention software available will use some level of machine learning to process and categorize common data types.
For example, many solutions have the ability to identify and classify credit card numbers stored in a spreadsheet or an email. Some even use optical character recognition to detect images of credit cards. Most data loss prevention software solutions incorporate this level of data classification out-of-the-box because spreadsheets, emails, and images are common data types and because there are compliance regulations around how districts are required to store and protect student, staff, and other types of sensitive information.
How data loss prevention software works relies heavily on rules and policies that drive action. Basically, rules tell the software what data needs to be checked, and policies tell the software how to handle it.
Let’s say you want to make sure that students’ social security numbers are not shared outside of a specific group of users in your domain. Most solutions have templated rules for this that you can use, but we will continue with this instance to better understand how the software works.
First, you will set up a rule that tells it what social security numbers look like. You will need to set up a “pattern” for the system to check for. You should also be able to set up “approved” patterns and words in the rule to reduce the number of false positives you experience once the policy is live. Most, if not all, of the DLP software on the market today will include the ability to validate the number of false positives using the Luhn algorithm, either by providing you with the option or simply doing it automatically.
Once you have your rule set up to detect social security numbers in your environment, you need to set up policies to tell the software what to do with them. Again, there is usually an “out of the box” template for this type of data loss prevention policy but it’s good to know how to adjust it if you need to down the road.
Policies are where the fun really begins. Policies are set up by identifying the rule as a “trigger” and then telling the data loss prevention software how to respond to it. So, you may set up a policy that tells the software to “revoke sharing” when it finds a file that is breaking the “files containing social security numbers” rule.
You’ll want to set up notifications in these policies to notify your system admin that a rule has been violated so they can investigate it further if needed. For certain types of policy violations, particularly where an unauthorized file share has occurred, you should set up user notifications as well. This helps continually remind and educate your colleagues on the importance of data security and what types of data should not be shared.
Data loss prevention rules can also be created around certain types and sizes of files. We’ve seen cases where users were uploading bootlegged movies into a district’s shared drive and sharing them with other students, teachers, etc. Not only is this illegal, but it also took up a huge amount of storage space (which, if you’re a Google shop, is now limited).
The system admin can go in and create a data loss prevention rule to match files based on type and size, and then remove the files in bulk. They can then create a policy that would detect these types of files from now on and automatically remove them from their shared drive.
File matching in data loss prevention is a powerful tool. It can be used to detect encrypted files that should be protected from being uploaded or created in your environment. When you pair file matching data loss prevention tools with content matching data loss prevention rules, you have a strong structure in place to protect your data and cloud environment.
A relatively new data loss prevention capability is image scanning, also referred to as optical character recognition. Data loss prevention software that has optical character recognition capabilities is a definite must, but not all data loss prevention or CASB vendors provide it.
Optical character recognition allows the software to scan images files, such as JPEG, PNG, etc., for rule violations. Going back to our social security number example, if an employee has taken a picture of their social security card and saved it to your shared drive (yes, we have seen this happen on more occasions than you would like to know), you don’t want that information to go outside of the people who should have access to it. You may not want them to have that file in the shared drive at all, so you’ll want to be able to remove it.
An even more concerning scenario is if there are screenshots or PDFs in your shared drives that contain sensitive student, staff, or financial information. Data loss prevention solutions that don’t use optical character recognition technology won’t be able to detect the information in those types of files. But those files should be treated exactly the same as spreadsheets, text documents, and emails that contain sensitive information.
We’ve touched on data loss prevention alerts and remediation in previous sections, but it is a very important step in the process of making data loss prevention work for you. Equally important is how different data loss prevention solutions handle alerts and remediation because it’s not enough to just flag a rule violation. You need to be able to do something about it!
As you’re setting up your policy, you’ll need to make decisions about what needs to happen when it’s triggered by a rule violation. As previously discussed, there are instances when you may want to send a user or admin a notification to alert them to the issue.
Automated remediation can take many forms. A few examples include:
Understanding how data loss prevention software works is a critical component of your DLP tool stack and strategy. It will save you a lot of time and headache that comes from trying to manage data loss/leak prevention manually. This allows you to focus on other priorities while maintaining some peace of mind!