How to Use Amazon Macie
In our lab walkthrough series, we go through selected lab exercises on our INE Platform. Subscribe or sign up for a 7-day, risk-free trial with INE and access this lab and a robust library covering the latest in Cyber Security, Networking, Cloud, and Data Science!
Purpose: When storing data in the S3 bucket, it is critical that we never store any sensitive data in it. If the bucket is having the public access, it becomes vulnerable to attackers. As a result, it is essential to have a service that can detect potential leaks of sensitive data into the S3 bucket. Amazon Macie comes in handy here. In this article, we will learn how to use the Amazon Macie service to find the sensitive data leak into the S3 bucket.
Technical difficulty:
| Novice | Beginner | Competent | Proficient | Expert
What is Amazon Macie?
Amazon Macie is a fully managed data security and privacy solution that uses machine learning and pattern matching to assist you in discovering, monitoring, and protecting sensitive data in your AWS environment.
Macie automates the detection of sensitive data, such as personally identifiable information (PII) and financial data, to provide you a better knowledge of the data stored in Amazon Simple Storage Service (Amazon S3). Macie also keeps an inventory of your S3 buckets and automatically reviews and monitors them for security and access control. Amazon Macie can also detect and report excessively permissive or unencrypted buckets.
What are Amazon Macie Findings?
When Amazon Macie finds potential policy breaches or issues with the security or privacy of your S3 buckets, or when sensitive data is discovered in S3 objects, it generates findings. A finding is a thorough report on a potential issue or sensitive data discovered by Macie. Each finding includes a severity assessment, information about the affected resource, and extra information, such as when and how Macie discovered the issue or data. Macie stores your policy and sensitive data discoveries for 90 days.
Types of Amazon Macie findings
Amazon Macie generates two categories of findings:
Policy findings
Sensitive data findings
Policy finding is a detailed report of a potential policy violation or security or privacy issue with an Amazon S3 bucket. These findings are generated by Macie as part of its ongoing monitoring of your Amazon S3 data.
Sensitive data finding is a comprehensive report on sensitive data found in an S3 object. When Macie discovers sensitive data in S3 objects that you configure a sensitive data discovery job to analyze, it generates these findings. Each category contains a different type of finding.
Lab Workflow
In this lab, we will create an S3 bucket and store sensitive information in it as a JSON file, and then use the Amazon Macie service and regular expressions to try to find the sensitive data findings.
Now that we have covered all the key terms for the lab, let's carry out the experiment.
Lab Scenario
We have set up the below scenario in our INE labs for our students to practice. The screenshots have been taken from our online lab environment.
Lab Link: Amazon Macie
Objective
Store the sensitive data into S3 bucket and use Amazon Macie to generate the sensitive data findings.
Solution
Step 1: Click the lab link button to get access credentials. Login to the AWS account with these credentials.
Step 2: Create S3 bucket and upload sensitive data. Search for S3 in the search bar and navigate to the S3 dashboard.
Step 3: Click on the “Create bucket” button.
Step 4: Set the bucket name as "student-lab-bucket-" and append the account id at the end.
Step 5: Enable ACLs and set the object ownership to “Object writer”.
Step 6: Uncheck the “Block all public access” and make the bucket public.
Confirm the action by checking the acknowledging the current settings.
Click on the “Create bucket” button.
Successfully created the bucket.
There are no objects available in the bucket. Upload files by clicking the “Upload” button.
Step 7: Create a JSON file and set name as “data.json”.
Command: nano data.json
Step 8: Copy and paste the following code inside the data.json file.
Code :
[
{
"id": 1,
"jobTitleName": "Developer",
"firstName": "Romin",
"lastName": "Irani",
"preferredFullName": "Romin Irani",
"employeeCode": "ANC-1790",
"region": "CA"
},
{
"id": 2,
"jobTitleName": "Developer",
"firstName": "Neil",
"lastName": "Irani",
"preferredFullName": "Neil Irani",
"employeeCode": "AEF-2351",
"region": "CA"
}
]
This is sample employee information. We are using employee code as sensitive information and detecting it with Amazon Macie.
Step 9: Choose the “data.json” file to upload.
Click on the “Upload” button.
Step 10: Click on “Permissions” inside the created bucket.
Step 11: Click on “Edit” in the ACL block.
Enable public read access.
Confirm the action by checking the acknowledging the current settings.
Now the bucket is publicly accessible.
Step 12: Search for macie in the search bar and navigate to the Amazon Macie dashboard.
Here we will create a custom data identifier where we will set a regular expression that matches the pattern of data present in the S3 bucket.
Click on the “Get started” button.
Step 13: Click on the “Enable Macie” button.
As soon as Macie is enabled, it will automatically discover all the buckets and objects that are stored inside each bucket, and the Macie dashboard will appear based on the size and count of the buckets.
Step 14: Click on the “Create job” button.
A sensitive data discovery job is a series of automated processing and analysis tasks that Macie performs to analyze objects in S3 buckets and determine whether the objects contain sensitive data.
Step 15: For the Refine the scope step, choose One-time job, and then choose Next.
Step 16: Select the created S3 bucket.
Click on the “Next” button.
Review S3 bucket settings.
Click on the “Next” button.
Step 17: Click on the arrow to expand the window of Additional settings.
Step 18: Let the Object criteria be default as File name extensions. Enter “json” in the textbox and click on the Include button.
Amazon Macie can analyze data in many different formats, including commonly used compression and archive formats.
Successfully included the file extension “JSON”.
Click on the “Next” button.
Step 19: Set selection type as “All”.
Click on the “Next” button.
Step 20: Create a custom identifier to find the sensitive data from the json file. Click on “Manage custom identifier”.
A custom data identifier is a set of criteria that you define to detect sensitive data. The criteria consist of a regular expression (regex) that defines a text pattern to match and, optionally, character sequences and a proximity rule that refine the results.
Step 21: Click on the “Create” button.
Step 22: Set the identifier name as “EmployeeCodeIdentifier”.
Step 23: Copy and paste the following regular expression to match the sensitive data in the file.
Regular expression: [a-z]{3}-[0-9]{4}
This identifier finds the data present in the format of ABC-0123 i.e. three characters, dash and followed by four numbers.
Click on “Submit”.
Review the settings and click on “Submit” again.
Successfully created custom identifier.
Step 24: Navigate back to the job creation stage and click on the refresh button.
Step 25: Now select the created custom identifier.
Click on the “Next” button.
Keep the allow lists as empty. With allow lists in Amazon Macie, you can define specific text and text patterns that you want Macie to ignore when it inspects Amazon S3 objects for sensitive data.
Click on the “Next” button.
Step 26: Enter the job name as “DataIdentification”.
Now click on the “Next” button.
Now click on the “Submit” button.
Successfully created a macie job.
Step 27: Click on “Findings”.
If Macie discovers sensitive data in an object, Macie creates a sensitive data finding. A sensitive data finding is a detailed report of sensitive data that Macie found in an object.
Step 28: Select the finding with the type “SensitiveData:S3Object/Personal”.
Sensitive data finding indicates that the object contains personally identifiable information (such as full names or mailing addresses), personal health information (such as health insurance or medical identification numbers), or a combination of the two. In our case the sensitive data is the employee code.
Step 29: Select the finding and click on “Export(JSON) under Actions”.
The complete detail of the finding will be available in the JSON.
References:
Conclusion
Congratulations! We learnt how to store the sensitive data into S3 bucket and use Amazon Macie to generate the sensitive data findings.
Try out Amazon Macie hands-on in our lab! Subscribe or sign up for a 7-day, risk-free trial with INE to access this lab and a robust library covering the latest in Cyber Security, Networking, Cloud, and Data Science!