As the amount of data we store (hoard?) increases it becomes harder to know exactly what we have. And if we don't know what data we have, it becomes challenging to know what we are protecting. Amazon Macie is a new service that uses machine learning algorithms for natural language processing to automate data classification S3 buckets.
Macie - the name has dual meanings as it's both a French and English word for "weapon" and "bold, sporty and sweet" respectively - can detect sources of PII and look at access patterns in order to detect anomalous usage patterns. It continuously checks Cloudtrail events for PUT requests in S3 buckets and automatically classifies new objects in almost real time.
As the ability to detect breaches becomes more important - Australian breach notification laws and the EU's GDPR are just two of the regulations businesses need to consider - and the amount of data we need to manage increases, we'll need to find ways to automate as much of the data lifecycle as we can.
Macie is a step in that direction.
Amazon has posted instructions on how to use Macie