■ はじめに

 boto3でS3を操作する方法をメモ。

【０】boto3とは？

AWS を Python から操作するためのライブラリ

API 仕様
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html

【１】list_objects_v2

* ファイル一覧

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2
https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html

使用上の注意

以下の関連記事を参照のこと。

boto3 / list_objects_v2 の使用上の注意とその対策
https://dk521123.hatenablog.com/entry/2019/12/06/232617

* 試していないが「Boto3でS3のリスト出力をするときは、
　list_objects_v2ではなくBucket().objects.filterを使おう」
　なんて記事もある（詳細は以下のサイト参照）

https://qiita.com/elyunim26/items/a513226b76b3cb8928c2

例１：ファイル一覧表示

import boto3

client = boto3.client('s3')

response = client.list_objects_v2(
    Bucket='bucket-name',
    Prefix='xxxx/yyy/zzzz/',
)
s3_contents = response['Contents']
for s3_content in s3_contents:
   key = s3_content.get('Key')
   print(key)

【２】get_object

ファイルの読み込みに使える

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.get_object

例１）ファイルを読み込む

import boto3
import yaml

s3_client = boto3.client('s3')
response =
   s3_client.get_object(Bucket='bucket-name', Key="sample.sql")
body = response["Body"].read()
print(body.decode('utf-8'))

例２）YAMLファイルを読み込む

import boto3
import yaml

s3_client = boto3.client('s3')
response =
   s3_client.get_object(Bucket='bucket-name', Key="filename.yaml")

try:
    config = yaml.safe_load(response["Body"])
except yaml.YAMLError as exc:
    return exc

参考文献
https://qiita.com/NoriakiOshita/items/2f9e3a16110679e0efac
https://gist.github.com/coingraham/c6153809e4d179396421

【３】copy / copy_object

ファイルのコピーに使える

copy
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.copy
copy_object
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.copy_object

使用上の注意

大きいファイル（5GB超）をコピーする際は、copy_objectは使えない。
 => 使うと、以下「エラー内容」のような例外が発生する
 => 代わりに copy() を使うっとのこと

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.copy_object

より抜粋
～～～～
Note

You can store individual objects of up to 5 TB in Amazon S3.
You create a copy of your object up to 5 GB in size in a single atomic operation using this API.
However, to copy an object greater than 5 GB,
 you must use the multipart upload Upload Part - Copy API.
For more information, see Copy Object Using the REST Multipart Upload API .
～～～～

エラー内容

ClientError: An error occurred (invalidRequest) when calling the CopyObject operation:
The specified copy source is larger than the maximum allowable size for a copy source: 5368709120

参考文献
https://dev.classmethod.jp/articles/python-boto3-s3-meta-client-copy/

例１：S3の指定した先の全ファイルをコピー又は切り取りする

import boto3

def copy_all_files(
  s3_client,
  s3_resource,
  src_s3_bucket, src_prefix,
  dest_s3_bucket, dest_prefix,
  is_moving=False,
  is_dry_run=False):

  try:
    next_token = ''
    mark_of_dry_run = "[Dry run] " if (is_dry_run is True) else ""
    while True:
      if next_token == '':
        response = s3_client.list_objects_v2(
          Bucket=src_s3_bucket,
          Prefix=src_prefix)
      else:
        response = s3_client.list_objects_v2(
          Bucket=src_s3_bucket,
          Prefix=src_prefix,
          ContinuationToken=next_token)

      if 'Contents' in response:
        contents = response['Contents']
        for content in contents:
          content_key = content['Key']
          source_prefix = content_key
          destination_prefix = "{}/{}".format(
            dest_prefix, content_key)

          source_dict = {
            'Bucket': src_s3_bucket,
            'Key': source_prefix
          }
          print("{} Coping s3://{}/{} to s3://{}/{}".format(
            mark_of_dry_run,
            src_s3_bucket,
            source_prefix,
            dest_s3_bucket,
            destination_prefix))

          if not is_dry_run:
            response_copy = s3_resource.meta.client.copy(
              source_dict,
              dest_s3_bucket,
              destination_prefix)
            print(response_copy)

          if is_moving:
            print("{} Deleting s3://{}/{}".format(
              mark_of_dry_run, src_s3_bucket, source_prefix))
            if not is_dry_run:
              response_delete = s3_client.delete_object(
                Bucket=src_s3_bucket, Key=source_prefix)
              print(response_delete)

      if 'NextContinuationToken' in response:
        next_token = response['NextContinuationToken']
      else:
        print("Done")
        break
  except Exception as ex:
    print(str(ex))
    raise ex

if __name__ == "__main__":
  s3_client = boto3.client('s3')
  s3_resource = boto3.resource('s3')
  copy_all_files(
    s3_client,
    s3_resource,
    "your-s3-bucket1",
    "target/directory/src",
    "your-s3-bucket2",
    "target/directory/dict",
    True,
    True)

【４】delete_object / delete_objects

* ファイル削除

delete_object
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.delete_object

response = client.delete_object(
    Bucket='string',
    Key='string',
    ...
)

delete_objects
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.delete_objects

response = client.delete_objects(
    Bucket='string',
    Delete={
        # 1000 keys まで指定可能
        'Objects': [
            {
                'Key': 'string',
                'VersionId': 'string'
            },
        ],
        'Quiet': True|False
    },
    ...
)

使用上の注意

* ディレクトリ削除は、結構面倒くさい
 => ファイルが中にあった場合、さっくりディレクトリごと削除ってことができない

https://dev.classmethod.jp/articles/20180625-how-to-delete-s3folder/

例１：ディレクトリ削除

import boto3

def delete_directory(s3_client, s3_bucket, prefix):
  if prefix[-1] != "/":
    prefix = prefix + "/"

  try:
    next_token = ''
    while True:
      if next_token == '':
        response = s3_client.list_objects_v2(
          Bucket=s3_bucket,
          Prefix=prefix)
      else:
        response = s3_client.list_objects_v2(
          Bucket=s3_bucket,
          Prefix=prefix,
          ContinuationToken=next_token)

      if 'Contents' in response:
        contents = response['Contents']
        for content in contents:
          content_key = content['Key']
          print("Deleting s3://{}/{}/{}".format(
            s3_bucket,
            prefix,
            content_key))
          s3_client.delete_object(
            Bucket=s3_bucket, Key=content_key)
      if 'NextContinuationToken' in response:
        next_token = response['NextContinuationToken']
      else:
        print("Done")
        break
  except Exception as ex:
    print(str(ex))
    raise ex

if __name__ == "__main__":
  s3_client = boto3.client('s3')
  delete_directory(s3_client, "your-s3-bucket", "target/directory")

【５】put_object

* S3 への書き込み。

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/put_object.html

例１：ファイルのS3への書き込み

import boto3

s3_client = boto3.client('s3')
response = s3_client.put_object(
  Bucket="your-s3-bucket-name",
  Key="xxx/test.txt",
  Body="Hello World!!"
)

【６】head_bucket / head_object

* 存在チェック

head_bucket

* バケット存在チェック

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/head_bucket.html

head_object

* ファイル存在チェック

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/head_object.html

【７】分割アップロード

* 大きいファイル用に分割アップロード

API	Explanations
create_multipart_upload	マルチパートアップロードのセッションを開始する
upload_part	断片ファイルをアップロードする（ローカルファイルなど上の新しいデータをS3バケットにアップロードする際に使用）
upload_part_copy	断片ファイルをアップロードする（S3バケット内の既存のオブジェクト（またはその一部）を、別のS3オブジェクトのパートとしてコピー）
complete_multipart_upload	マルチパートアップロードのセッションを終了する
abort_multipart_upload	マルチパートアップロードの中断