Batch data ingestion on AWS
Creating a
Batch data transfer ingestion solution on AWS
In this
blog we would be creating a batch data transfer solution on AWS which would
transfer data stored locally on my laptop machine into a data lake (S3) hosted
on AWS.
Batch data ingestion is ideal for scenarios where data has already been produced and is sitting somewhere else.
AWS components used:-
1. S3 Bucket
2. AWS Transfer family server- SFTP
3. SFTP client on the local machine (Filezilla)
4. IAM tole with required permissions to ingest file into S3
5. IAM role assigned to the SFTP server created in step 2
Creating
SFTP server:-
- 1. From the AWS console select AWS transfer
family
- 2. Select Create Server
- 3. In the protocols window select
create SFTP and click next
- 4. In identity provider window select
service managed and click next
- 5. In endpoint configuration select publicly
accessible.
- 6. In domain select AWS S3
- 7. In cloudwatch logging window select
create new role and click next
- 8. Verify the details and click create
server.
Create SFTP folder in your local machine which you would use to ingest batch data into AWS
1. mkdir test
2. cd test
Create Key pair on your local machine to be able to use sftp
1. ssh-keygen
2. hit enter to choose the default values
3. the key pair is present in the present directory
4. copy the public key present in the .pub folder
Create User
for the sftp server:-
1. Click on the SFTP server created
3. Enter the username


Comments
Post a Comment