- バックアップ取得
- PHP 互換性チェック
- PHP アップデート
バックアップ取得
こちらのページの「データベースとファイルのバックアップ」のステップをそのまま行いました。
PHP 互換性チェック
WordPress の案内記事で書かれていた PHP Compatibility Checker で行いました。
PHP アップデート
さくらのレンタルサーバのコントロールパネル > スクリプト設定 > 言語のバージョン設定

Random Notes
こちらのページの「データベースとファイルのバックアップ」のステップをそのまま行いました。
WordPress の案内記事で書かれていた PHP Compatibility Checker で行いました。
さくらのレンタルサーバのコントロールパネル > スクリプト設定 > 言語のバージョン設定
% pip install mysqlclient Collecting mysqlclient Downloading mysqlclient-2.2.7.tar.gz (91 kB) Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error × Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [30 lines of output] /bin/sh: pkg-config: command not found /bin/sh: pkg-config: command not found /bin/sh: pkg-config: command not found /bin/sh: pkg-config: command not found Trying pkg-config --exists mysqlclient Command 'pkg-config --exists mysqlclient' returned non-zero exit status 127. Trying pkg-config --exists mariadb Command 'pkg-config --exists mariadb' returned non-zero exit status 127. Trying pkg-config --exists libmariadb Command 'pkg-config --exists libmariadb' returned non-zero exit status 127. Trying pkg-config --exists perconaserverclient Command 'pkg-config --exists perconaserverclient' returned non-zero exit status 127. Traceback (most recent call last): File "/Users/ユーザー名/PythonProjects/yahoonews_scraper/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module> main() File "/Users/ユーザー名/PythonProjects/yahoonews_scraper/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main json_out["return_val"] = hook(**hook_input["kwargs"]) File "/Users/ユーザー名/PythonProjects/yahoonews_scraper/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 143, in get_requires_for_build_wheel return hook(config_settings) File "/private/var/folders/zs/f04s_hhx3s73djbc3h9cvyn40000gn/T/pip-build-env-ost_1kt5/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 333, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=[]) File "/private/var/folders/zs/f04s_hhx3s73djbc3h9cvyn40000gn/T/pip-build-env-ost_1kt5/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 303, in _get_build_requires self.run_setup() File "/private/var/folders/zs/f04s_hhx3s73djbc3h9cvyn40000gn/T/pip-build-env-ost_1kt5/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 319, in run_setup exec(code, locals()) File "<string>", line 156, in <module> File "<string>", line 49, in get_config_posix File "<string>", line 28, in find_package_name Exception: Can not find valid pkg-config name. Specify MYSQLCLIENT_CFLAGS and MYSQLCLIENT_LDFLAGS env vars manually [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error × Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> See above for output. note: This error originates from a subprocess, and is likely not a problem with pip.
% brew install pkg-config ==> Auto-updating Homebrew... Adjust how often this is run with HOMEBREW_AUTO_UPDATE_SECS or disable with HOMEBREW_NO_AUTO_UPDATE. Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`). ==> Auto-updated Homebrew! Updated 3 taps (homebrew/services, homebrew/core and homebrew/cask). ==> New Formulae babelfish redocly-cli umka-lang xlsclients xwininfo ludusavi sdl3 xeyes xprop ==> New Casks dana-dex font-maple-mono-nf-cn startupfolder dockfix imaging-edge-webcam valhalla-freq-echo flashspace linearmouse@beta valhalla-space-modulator font-maple-mono-cn muteme You have 1 outdated cask installed. ==> Downloading https://ghcr.io/v2/homebrew/core/pkgconf/manifests/2.3.0_1-1 ######################################################################### 100.0% ==> Fetching pkgconf ==> Downloading https://ghcr.io/v2/homebrew/core/pkgconf/blobs/sha256:fb3a6a6fcb ######################################################################### 100.0% ==> Pouring pkgconf--2.3.0_1.sonoma.bottle.1.tar.gz 🍺 /usr/local/Cellar/pkgconf/2.3.0_1: 27 files, 328.6KB ==> Running `brew cleanup pkgconf`... Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP. Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`). %
% pip install mysqlclient Collecting mysqlclient Using cached mysqlclient-2.2.7.tar.gz (91 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: mysqlclient Building wheel for mysqlclient (pyproject.toml) ... done Created wheel for mysqlclient: filename=mysqlclient-2.2.7-cp38-cp38-macosx_10_9_x86_64.whl size=75920 sha256=dfb71baa06f2124c94179a921f590419fe72775f3da899f8adc1261117cdb701 Stored in directory: /Users/ユーザー名/Library/Caches/pip/wheels/5b/ed/4f/23fd3001b8c8e25f152c11a3952754ca29b5d5f254b6213056 Successfully built mysqlclient Installing collected packages: mysqlclient Successfully installed mysqlclient-2.2.7 %
発生したエラーは下記の通り。
% npx create-react-app my-new-app Creating a new React app in /Users/username/React/my-new-app. Installing packages. This might take a couple of minutes. Installing react, react-dom, and react-scripts with cra-template... added 1324 packages in 16s 268 packages are looking for funding run `npm fund` for details Initialized a git repository. Installing template dependencies using npm... npm error code ERESOLVE npm error ERESOLVE unable to resolve dependency tree npm error npm error While resolving: my-new-app@0.1.0 npm error Found: react@19.0.0 npm error node_modules/react npm error react@"^19.0.0" from the root project npm error npm error Could not resolve dependency: npm error peer react@"^18.0.0" from @testing-library/react@13.4.0 npm error node_modules/@testing-library/react npm error @testing-library/react@"^13.0.0" from the root project npm error npm error Fix the upstream dependency conflict, or retry npm error this command with --force or --legacy-peer-deps npm error to accept an incorrect (and potentially broken) dependency resolution. npm error npm error npm error For a full report see: npm error /Users/username/.npm/_logs/2025-01-26T21_38_14_756Z-eresolve-report.txt npm error A complete log of this run can be found in: /Users/username/.npm/_logs/2025-01-26T21_38_14_756Z-debug-0.log `npm install --no-audit --save @testing-library/jest-dom@^5.14.1 @testing-library/react@^13.0.0 @testing-library/user-event@^13.2.1 web-vitals@^2.1.0` failed
エラーの内容としては、依存関係の解決に失敗したとのこと。
一番下の部分で「npm install --no-audit --save @testing-library/jest-dom@^5.14.1 @testing-library/react@^13.0.0 @testing-library/user-event@^13.2.1 web-vitals@^2.1.0
」というコマンドが失敗したと記述があります。
後でこの「npm install --no-audit --save @testing-library/jest-dom@^5.14.1 @testing-library/react@^13.0.0 @testing-library/user-event@^13.2.1 web-vitals@^2.1.0
」の部分を再実行します。
途中まで作成された my-new-app ディレクトリに入り、package.json ファイルを修正します。
# package.json "dependencies": { "cra-template": "1.2.0", "react": "^19.0.0", "react-dom": "^19.0.0", "react-scripts": "5.0.1" },
上記のように dependencies となっている中の「19.0.0」を「18.0.0」に変更します。
# package.json "dependencies": { "cra-template": "1.2.0", "react": "^18.0.0", "react-dom": "^18.0.0", "react-scripts": "5.0.1" },
これでファイルを保存します。
ターミナル上でも「my-new-app」ディレクトリに移動し、先ほど失敗していた「npm install –no-audit –save @testing-library/jest-dom@^5.14.1 @testing-library/react@^13.0.0 @testing-library/user-event@^13.2.1 web-vitals@^2.1.0」を実行します。
% cd my-new-app % npm install --no-audit --save @testing-library/jest-dom@^5.14.1 @testing-library/react@^13.0.0 @testing-library/user-event@^13.2.1 web-vitals@^2.1.0 added 47 packages, and changed 4 packages in 6s 272 packages are looking for funding run `npm fund` for details
特にエラーは起きませんでした。
react を起動します。
% npm start Compiled successfully! You can now view my-new-app in the browser. Local: http://localhost:3000 On Your Network: http://192.168.1.9:3000 Note that the development build is not optimized. To create a production build, use npm run build. webpack compiled successfully
無事に起動できました。
とりあえず必要なファイルは三種類。init.sql、.env、そして Dockerfile です。
3つとも同じディレクトリに配置します。
ls -a . .. .env Dockerfile init.sql
--init.sql CREATE DATABASE IF NOT EXISTS buzzing; USE buzzing; CREATE TABLE IF NOT EXISTS `yt_mst_cnl` ( `channel_id` varchar(40) NOT NULL, `channel_name` tinytext, `description` text, `thumbnail` text, `uploads_list` varchar(40) DEFAULT NULL, `published_at` date DEFAULT NULL, `data_update_date` date DEFAULT NULL, PRIMARY KEY (`channel_id`), KEY `idx_published_at` (`published_at`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; CREATE TABLE IF NOT EXISTS `yt_mst_vid` ( `video_id` varchar(20) NOT NULL, `video_name` text, `description` text, `thumbnail` text, `channel_id` varchar(40) DEFAULT NULL, `published_at` varchar(8) DEFAULT NULL, `data_update_date` varchar(8) DEFAULT NULL, PRIMARY KEY (`video_id`), KEY `fk_channel` (`channel_id`), KEY `idx_published_at` (`published_at`), CONSTRAINT `fk_channel` FOREIGN KEY (`channel_id`) REFERENCES `yt_mst_cnl` (`channel_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; CREATE TABLE IF NOT EXISTS `yt_pfm_cnl` ( `channel_id` varchar(40) DEFAULT NULL, `subscriber_count` bigint DEFAULT NULL, `hidden_subscriber_count` varchar(1) DEFAULT NULL, `view_count` bigint DEFAULT NULL, `video_count` int DEFAULT NULL, `data_date` varchar(8) DEFAULT NULL, KEY `fk_channel_pfm` (`channel_id`), CONSTRAINT `fk_channel_pfm` FOREIGN KEY (`channel_id`) REFERENCES `yt_mst_cnl` (`channel_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; CREATE TABLE IF NOT EXISTS `yt_pfm_vid` ( `video_id` varchar(20) DEFAULT NULL, `view_count` bigint DEFAULT NULL, `like_count` int DEFAULT NULL, `dislike_count` int DEFAULT NULL, `favorite_count` int DEFAULT NULL, `comment_count` int DEFAULT NULL, `most_used_words` text, `data_date` varchar(8) DEFAULT NULL, KEY `fk_video_pfm` (`video_id`), CONSTRAINT `fk_video_pfm` FOREIGN KEY (`video_id`) REFERENCES `yt_mst_vid` (`video_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; CREATE TABLE IF NOT EXISTS `yt_analysis_07` ( `channel_id` varchar(40) DEFAULT NULL, `channel_name` tinytext, `view_count` bigint DEFAULT NULL, `like_count` int DEFAULT NULL, `dislike_count` int DEFAULT NULL, `favorite_count` int DEFAULT NULL, `comment_count` int DEFAULT NULL, `video_count` int DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; DROP PROCEDURE IF EXISTS `buzzing`.`yt_analysis_07`; DELIMITER // CREATE PROCEDURE `buzzing`.`yt_analysis_07`(IN published_after VARCHAR(8)) BEGIN -- truncate the yt_analysis_07 table TRUNCATE TABLE yt_analysis_07; -- insert data into the yt_analysis_07 table INSERT INTO yt_analysis_07 SELECT channel_id, channel_name, view_count, like_count, dislike_count, favorite_count, comment_count, video_count FROM ( SELECT B.channel_id AS channel_id, MAX(C.channel_name) AS channel_name, SUM(A.view_count) AS view_count, SUM(A.like_count) AS like_count, SUM(A.dislike_count) AS dislike_count, SUM(A.favorite_count) AS favorite_count, SUM(A.comment_count) AS comment_count, COUNT(*) AS video_count FROM yt_pfm_vid A LEFT JOIN yt_mst_vid B ON A.video_id = B.video_id LEFT JOIN ( SELECT channel_id, MAX(channel_name) AS channel_name FROM yt_mst_cnl GROUP BY channel_id ) C ON B.channel_id = C.channel_id WHERE B.published_at >= @published_after GROUP BY channel_id ) T1 ORDER BY view_count DESC; COMMIT; END // DELIMITER ;
MYSQL_ROOT_PASSWORD=rootpassword MYSQL_USER=admin MYSQL_PASSWORD=password MYSQL_DATABASE=buzzing
# Dockerfile FROM mysql ADD init.sql /docker-entrypoint-initdb.d
% docker build -t docker_mysql:1.0 . % docker run --env-file .env --name docker_mysql -p 13306:3306 -it -d docker_mysql:1.0 % docker exec -it docker_mysql bash bash-4.4#
bash-4.4# mysql -u admin -p Enter password: mysql>
mysql> use buzzing; mysql> show tables; +-------------------+ | Tables_in_buzzing | +-------------------+ | yt_analysis_07 | | yt_mst_cnl | | yt_mst_vid | | yt_pfm_cnl | | yt_pfm_vid | +-------------------+ 5 rows in set (0.00 sec)
mysql> call buzzing.yt_analysis_07('20230101'); Query OK, 0 rows affected (0.03 sec)
ちなみに DBeaver でもポート 13306 を通して接続できました。
Apache Airflow 公式サイトから取得できいる docker-compose.yaml はメタデータ用のデータベースに posgresql を使っているのですが、MySQL で立ち上げられる様に変更してみました。
これはその時の備忘録です。
公式サイトのリンクから YAML ファイルを取得
% mkdir docker_airflow % cd docker_airflow % mkdir -p ./dags ./logs ./plugins ./config % curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.6.1/docker-compose.yaml'
中身がこれ(2023 年 5 月時点)
version: '3.8' x-airflow-common: &airflow-common # In order to add custom dependencies or upgrade provider packages you can use your extended image. # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml # and uncomment the "build" line below, Then run `docker-compose build` to build the images. image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.6.1} # build: . environment: &airflow-common-env AIRFLOW__CORE__EXECUTOR: CeleryExecutor AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow # For backward compatibility, with Airflow <2.3 AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0 AIRFLOW__CORE__FERNET_KEY: '' AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true' AIRFLOW__CORE__LOAD_EXAMPLES: 'true' AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session' # yamllint disable rule:line-length # Use simple http server on scheduler for health checks # See https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/check-health.html#scheduler-health-check-server # yamllint enable rule:line-length AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true' # WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks # for other purpose (development, test and especially production usage) build/extend Airflow image. _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-} volumes: - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs - ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config - ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins user: "${AIRFLOW_UID:-50000}:0" depends_on: &airflow-common-depends-on redis: condition: service_healthy postgres: condition: service_healthy services: postgres: image: postgres:13 environment: POSTGRES_USER: airflow POSTGRES_PASSWORD: airflow POSTGRES_DB: airflow volumes: - postgres-db-volume:/var/lib/postgresql/data healthcheck: test: ["CMD", "pg_isready", "-U", "airflow"] interval: 10s retries: 5 start_period: 5s restart: always redis: image: redis:latest expose: - 6379 healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 10s timeout: 30s retries: 50 start_period: 30s restart: always airflow-webserver: <<: *airflow-common command: webserver ports: - "8080:8080" healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:8080/health"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-scheduler: <<: *airflow-common command: scheduler healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:8974/health"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-worker: <<: *airflow-common command: celery worker healthcheck: test: - "CMD-SHELL" - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"' interval: 30s timeout: 10s retries: 5 start_period: 30s environment: <<: *airflow-common-env # Required to handle warm shutdown of the celery workers properly # See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation DUMB_INIT_SETSID: "0" restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-triggerer: <<: *airflow-common command: triggerer healthcheck: test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"'] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-init: <<: *airflow-common entrypoint: /bin/bash # yamllint disable rule:line-length command: - -c - | function ver() { printf "%04d%04d%04d%04d" $${1//./ } } airflow_version=$$(AIRFLOW__LOGGING__LOGGING_LEVEL=INFO && gosu airflow airflow version) airflow_version_comparable=$$(ver $${airflow_version}) min_airflow_version=2.2.0 min_airflow_version_comparable=$$(ver $${min_airflow_version}) if (( airflow_version_comparable < min_airflow_version_comparable )); then echo echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m" echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!" echo exit 1 fi if [[ -z "${AIRFLOW_UID}" ]]; then echo echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m" echo "If you are on Linux, you SHOULD follow the instructions below to set " echo "AIRFLOW_UID environment variable, otherwise files will be owned by root." echo "For other operating systems you can get rid of the warning with manually created .env file:" echo " See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user" echo fi one_meg=1048576 mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg)) cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat) disk_available=$$(df / | tail -1 | awk '{print $$4}') warning_resources="false" if (( mem_available < 4000 )) ; then echo echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m" echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))" echo warning_resources="true" fi if (( cpus_available < 2 )); then echo echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m" echo "At least 2 CPUs recommended. You have $${cpus_available}" echo warning_resources="true" fi if (( disk_available < one_meg * 10 )); then echo echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m" echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))" echo warning_resources="true" fi if [[ $${warning_resources} == "true" ]]; then echo echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m" echo "Please follow the instructions to increase amount of resources available:" echo " https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin" echo fi mkdir -p /sources/logs /sources/dags /sources/plugins chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins} exec /entrypoint airflow version # yamllint enable rule:line-length environment: <<: *airflow-common-env _AIRFLOW_DB_UPGRADE: 'true' _AIRFLOW_WWW_USER_CREATE: 'true' _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow} _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow} _PIP_ADDITIONAL_REQUIREMENTS: '' user: "0:0" volumes: - ${AIRFLOW_PROJ_DIR:-.}:/sources airflow-cli: <<: *airflow-common profiles: - debug environment: <<: *airflow-common-env CONNECTION_CHECK_MAX_COUNT: "0" # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252 command: - bash - -c - airflow # You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up # or by explicitly targeted on the command line e.g. docker-compose up flower. # See: https://docs.docker.com/compose/profiles/ flower: <<: *airflow-common command: celery flower profiles: - flower ports: - "5555:5555" healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:5555/"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully volumes: postgres-db-volume:
不要な部分をコメントアウトしたのがこれ。Celery 関連、Redis関連をコメントアウトしています。
version: '3.8' x-airflow-common: &airflow-common # In order to add custom dependencies or upgrade provider packages you can use your extended image. # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml # and uncomment the "build" line below, Then run `docker-compose build` to build the images. image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.6.1} # build: . environment: &airflow-common-env AIRFLOW__CORE__EXECUTOR: LocalExecutor AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow # For backward compatibility, with Airflow <2.3 AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow # AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow # AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0 AIRFLOW__CORE__FERNET_KEY: '' AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true' AIRFLOW__CORE__LOAD_EXAMPLES: 'true' AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session' # yamllint disable rule:line-length # Use simple http server on scheduler for health checks # See https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/check-health.html#scheduler-health-check-server # yamllint enable rule:line-length AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true' # WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks # for other purpose (development, test and especially production usage) build/extend Airflow image. _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-} volumes: - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs - ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config - ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins user: "${AIRFLOW_UID:-50000}:0" depends_on: &airflow-common-depends-on # redis: # condition: service_healthy postgres: condition: service_healthy services: postgres: image: postgres:13 environment: POSTGRES_USER: airflow POSTGRES_PASSWORD: airflow POSTGRES_DB: airflow volumes: - postgres-db-volume:/var/lib/postgresql/data healthcheck: test: ["CMD", "pg_isready", "-U", "airflow"] interval: 10s retries: 5 start_period: 5s restart: always # redis: # image: redis:latest # expose: # - 6379 # healthcheck: # test: ["CMD", "redis-cli", "ping"] # interval: 10s # timeout: 30s # retries: 50 # start_period: 30s # restart: always airflow-webserver: <<: *airflow-common command: webserver ports: - "8080:8080" healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:8080/health"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-scheduler: <<: *airflow-common command: scheduler healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:8974/health"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully # airflow-worker: # <<: *airflow-common # command: celery worker # healthcheck: # test: # - "CMD-SHELL" # - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"' # interval: 30s # timeout: 10s # retries: 5 # start_period: 30s # environment: # <<: *airflow-common-env # # Required to handle warm shutdown of the celery workers properly # # See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation # DUMB_INIT_SETSID: "0" # restart: always # depends_on: # <<: *airflow-common-depends-on # airflow-init: # condition: service_completed_successfully airflow-triggerer: <<: *airflow-common command: triggerer healthcheck: test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"'] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-init: <<: *airflow-common entrypoint: /bin/bash # yamllint disable rule:line-length command: - -c - | function ver() { printf "%04d%04d%04d%04d" $${1//./ } } airflow_version=$$(AIRFLOW__LOGGING__LOGGING_LEVEL=INFO && gosu airflow airflow version) airflow_version_comparable=$$(ver $${airflow_version}) min_airflow_version=2.2.0 min_airflow_version_comparable=$$(ver $${min_airflow_version}) if (( airflow_version_comparable < min_airflow_version_comparable )); then echo echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m" echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!" echo exit 1 fi if [[ -z "${AIRFLOW_UID}" ]]; then echo echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m" echo "If you are on Linux, you SHOULD follow the instructions below to set " echo "AIRFLOW_UID environment variable, otherwise files will be owned by root." echo "For other operating systems you can get rid of the warning with manually created .env file:" echo " See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user" echo fi one_meg=1048576 mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg)) cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat) disk_available=$$(df / | tail -1 | awk '{print $$4}') warning_resources="false" if (( mem_available < 4000 )) ; then echo echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m" echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))" echo warning_resources="true" fi if (( cpus_available < 2 )); then echo echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m" echo "At least 2 CPUs recommended. You have $${cpus_available}" echo warning_resources="true" fi if (( disk_available < one_meg * 10 )); then echo echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m" echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))" echo warning_resources="true" fi if [[ $${warning_resources} == "true" ]]; then echo echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m" echo "Please follow the instructions to increase amount of resources available:" echo " https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin" echo fi mkdir -p /sources/logs /sources/dags /sources/plugins chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins} exec /entrypoint airflow version # yamllint enable rule:line-length environment: <<: *airflow-common-env _AIRFLOW_DB_UPGRADE: 'true' _AIRFLOW_WWW_USER_CREATE: 'true' _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow} _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow} _PIP_ADDITIONAL_REQUIREMENTS: '' user: "0:0" volumes: - ${AIRFLOW_PROJ_DIR:-.}:/sources airflow-cli: <<: *airflow-common profiles: - debug environment: <<: *airflow-common-env CONNECTION_CHECK_MAX_COUNT: "0" # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252 command: - bash - -c - airflow # You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up # or by explicitly targeted on the command line e.g. docker-compose up flower. # See: https://docs.docker.com/compose/profiles/ # flower: # <<: *airflow-common # command: celery flower # profiles: # - flower # ports: # - "5555:5555" # healthcheck: # test: ["CMD", "curl", "--fail", "http://localhost:5555/"] # interval: 30s # timeout: 10s # retries: 5 # start_period: 30s # restart: always # depends_on: # <<: *airflow-common-depends-on # airflow-init: # condition: service_completed_successfully volumes: postgres-db-volume:
% docker-compose up airflow-init % docker-compose up -d
version: '3.8' x-airflow-common: &airflow-common # In order to add custom dependencies or upgrade provider packages you can use your extended image. # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml # and uncomment the "build" line below, Then run `docker-compose build` to build the images. image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.6.1} # build: . environment: &airflow-common-env AIRFLOW__CORE__EXECUTOR: LocalExecutor # AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: mysql://airflow:airflowpassword@mysql/airflow # For backward compatibility, with Airflow <2.3 # AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow AIRFLOW__CORE__SQL_ALCHEMY_CONN: mysql://airflow:airflowpassword@mysql/airflow # AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow # AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0 AIRFLOW__CORE__FERNET_KEY: '' AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true' AIRFLOW__CORE__LOAD_EXAMPLES: 'true' AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session' # yamllint disable rule:line-length # Use simple http server on scheduler for health checks # See https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/check-health.html#scheduler-health-check-server # yamllint enable rule:line-length AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true' # WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks # for other purpose (development, test and especially production usage) build/extend Airflow image. _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-} volumes: - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs - ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config - ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins user: "${AIRFLOW_UID:-50000}:0" depends_on: &airflow-common-depends-on # redis: # condition: service_healthy # postgres: # condition: service_healthy mysql: condition: service_healthy services: # postgres: # image: postgres:13 # environment: # POSTGRES_USER: airflow # POSTGRES_PASSWORD: airflow # POSTGRES_DB: airflow # volumes: # - postgres-db-volume:/var/lib/postgresql/data # healthcheck: # test: ["CMD", "pg_isready", "-U", "airflow"] # interval: 10s # retries: 5 # start_period: 5s # restart: always mysql: image: mysql:8.0 command: --default-authentication-plugin=mysql_native_password restart: always environment: MYSQL_ROOT_PASSWORD: example MYSQL_DATABASE: airflow MYSQL_USER: airflow MYSQL_PASSWORD: airflowpassword volumes: - mysql-db-volume:/var/lib/mysql healthcheck: test: ["CMD", "mysqladmin" ,"ping", "-h", "localhost"] interval: 20s timeout: 10s retries: 5 ports: - "13306:3306" # redis: # image: redis:latest # expose: # - 6379 # healthcheck: # test: ["CMD", "redis-cli", "ping"] # interval: 10s # timeout: 30s # retries: 50 # start_period: 30s # restart: always airflow-webserver: <<: *airflow-common command: webserver ports: - "8080:8080" healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:8080/health"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-scheduler: <<: *airflow-common command: scheduler healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:8974/health"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully # airflow-worker: # <<: *airflow-common # command: celery worker # healthcheck: # test: # - "CMD-SHELL" # - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"' # interval: 30s # timeout: 10s # retries: 5 # start_period: 30s # environment: # <<: *airflow-common-env # # Required to handle warm shutdown of the celery workers properly # # See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation # DUMB_INIT_SETSID: "0" # restart: always # depends_on: # <<: *airflow-common-depends-on # airflow-init: # condition: service_completed_successfully airflow-triggerer: <<: *airflow-common command: triggerer healthcheck: test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"'] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-init: <<: *airflow-common entrypoint: /bin/bash # yamllint disable rule:line-length command: - -c - | function ver() { printf "%04d%04d%04d%04d" $${1//./ } } airflow_version=$$(AIRFLOW__LOGGING__LOGGING_LEVEL=INFO && gosu airflow airflow version) airflow_version_comparable=$$(ver $${airflow_version}) min_airflow_version=2.2.0 min_airflow_version_comparable=$$(ver $${min_airflow_version}) if (( airflow_version_comparable < min_airflow_version_comparable )); then echo echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m" echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!" echo exit 1 fi if [[ -z "${AIRFLOW_UID}" ]]; then echo echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m" echo "If you are on Linux, you SHOULD follow the instructions below to set " echo "AIRFLOW_UID environment variable, otherwise files will be owned by root." echo "For other operating systems you can get rid of the warning with manually created .env file:" echo " See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user" echo fi one_meg=1048576 mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg)) cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat) disk_available=$$(df / | tail -1 | awk '{print $$4}') warning_resources="false" if (( mem_available < 4000 )) ; then echo echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m" echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))" echo warning_resources="true" fi if (( cpus_available < 2 )); then echo echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m" echo "At least 2 CPUs recommended. You have $${cpus_available}" echo warning_resources="true" fi if (( disk_available < one_meg * 10 )); then echo echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m" echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))" echo warning_resources="true" fi if [[ $${warning_resources} == "true" ]]; then echo echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m" echo "Please follow the instructions to increase amount of resources available:" echo " https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin" echo fi mkdir -p /sources/logs /sources/dags /sources/plugins chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins} exec /entrypoint airflow version # yamllint enable rule:line-length environment: <<: *airflow-common-env _AIRFLOW_DB_UPGRADE: 'true' _AIRFLOW_WWW_USER_CREATE: 'true' _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow} _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow} _PIP_ADDITIONAL_REQUIREMENTS: '' user: "0:0" volumes: - ${AIRFLOW_PROJ_DIR:-.}:/sources airflow-cli: <<: *airflow-common profiles: - debug environment: <<: *airflow-common-env CONNECTION_CHECK_MAX_COUNT: "0" # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252 command: - bash - -c - airflow # You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up # or by explicitly targeted on the command line e.g. docker-compose up flower. # See: https://docs.docker.com/compose/profiles/ # flower: # <<: *airflow-common # command: celery flower # profiles: # - flower # ports: # - "5555:5555" # healthcheck: # test: ["CMD", "curl", "--fail", "http://localhost:5555/"] # interval: 30s # timeout: 10s # retries: 5 # start_period: 30s # restart: always # depends_on: # <<: *airflow-common-depends-on # airflow-init: # condition: service_completed_successfully volumes: # postgres-db-volume: mysql-db-volume:
コメント部分を削除したのがこちら。
version: '3.8' x-airflow-common: &airflow-common # In order to add custom dependencies or upgrade provider packages you can use your extended image. # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml # and uncomment the "build" line below, Then run `docker-compose build` to build the images. image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.6.1} # build: . environment: &airflow-common-env AIRFLOW__CORE__EXECUTOR: LocalExecutor AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: mysql://airflow:airflowpassword@mysql/airflow # For backward compatibility, with Airflow <2.3 AIRFLOW__CORE__SQL_ALCHEMY_CONN: mysql://airflow:airflowpassword@mysql/airflow AIRFLOW__CORE__FERNET_KEY: '' AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true' AIRFLOW__CORE__LOAD_EXAMPLES: 'true' AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session' # yamllint disable rule:line-length # Use simple http server on scheduler for health checks # See https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/check-health.html#scheduler-health-check-server # yamllint enable rule:line-length AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true' # WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks # for other purpose (development, test and especially production usage) build/extend Airflow image. _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-} volumes: - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs - ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config - ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins user: "${AIRFLOW_UID:-50000}:0" depends_on: &airflow-common-depends-on mysql: condition: service_healthy services: mysql: image: mysql:8.0 command: --default-authentication-plugin=mysql_native_password restart: always environment: MYSQL_ROOT_PASSWORD: airflowpassword MYSQL_DATABASE: airflow MYSQL_USER: airflow MYSQL_PASSWORD: airflowpassword volumes: - mysql-db-volume:/var/lib/mysql healthcheck: test: ["CMD", "mysqladmin" ,"ping", "-h", "localhost"] interval: 20s timeout: 10s retries: 5 ports: - "13306:3306" airflow-webserver: <<: *airflow-common command: webserver ports: - "8080:8080" healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:8080/health"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-scheduler: <<: *airflow-common command: scheduler healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:8974/health"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-triggerer: <<: *airflow-common command: triggerer healthcheck: test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"'] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-init: <<: *airflow-common entrypoint: /bin/bash # yamllint disable rule:line-length command: - -c - | function ver() { printf "%04d%04d%04d%04d" $${1//./ } } airflow_version=$$(AIRFLOW__LOGGING__LOGGING_LEVEL=INFO && gosu airflow airflow version) airflow_version_comparable=$$(ver $${airflow_version}) min_airflow_version=2.2.0 min_airflow_version_comparable=$$(ver $${min_airflow_version}) if (( airflow_version_comparable < min_airflow_version_comparable )); then echo echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m" echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!" echo exit 1 fi if [[ -z "${AIRFLOW_UID}" ]]; then echo echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m" echo "If you are on Linux, you SHOULD follow the instructions below to set " echo "AIRFLOW_UID environment variable, otherwise files will be owned by root." echo "For other operating systems you can get rid of the warning with manually created .env file:" echo " See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user" echo fi one_meg=1048576 mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg)) cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat) disk_available=$$(df / | tail -1 | awk '{print $$4}') warning_resources="false" if (( mem_available < 4000 )) ; then echo echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m" echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))" echo warning_resources="true" fi if (( cpus_available < 2 )); then echo echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m" echo "At least 2 CPUs recommended. You have $${cpus_available}" echo warning_resources="true" fi if (( disk_available < one_meg * 10 )); then echo echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m" echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))" echo warning_resources="true" fi if [[ $${warning_resources} == "true" ]]; then echo echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m" echo "Please follow the instructions to increase amount of resources available:" echo " https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin" echo fi mkdir -p /sources/logs /sources/dags /sources/plugins chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins} exec /entrypoint airflow version # yamllint enable rule:line-length environment: <<: *airflow-common-env _AIRFLOW_DB_UPGRADE: 'true' _AIRFLOW_WWW_USER_CREATE: 'true' _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow} _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow} _PIP_ADDITIONAL_REQUIREMENTS: '' user: "0:0" volumes: - ${AIRFLOW_PROJ_DIR:-.}:/sources airflow-cli: <<: *airflow-common profiles: - debug environment: <<: *airflow-common-env CONNECTION_CHECK_MAX_COUNT: "0" # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252 command: - bash - -c - airflow volumes: mysql-db-volume:
% docker-compose up airflow-init % docker-compose up -d
% mkdir docker_airflow % cd docker_airflow
% curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.6.1/docker-compose.yaml' % mkdir -p ./dags ./logs ./plugins ./config % docker-compose up airflow-init % docker-compose up -d
ブラウザで「http://127.0.0.1:8080」を開いてログインします。
プロジェクトディレクトリ直下の「dags」に下記の my_dag.py を追加しました。
色々書いてありますが、下の方で「bash_command=’echo “Hello, Airflow!”‘」となっている通り、Hello, Airflow! と表出するプログラムになっています。
# my_dag.py from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime default_args = { 'owner': 'airflow', 'start_date': datetime(2023, 1, 1), } with DAG(dag_id='my_dag', schedule_interval='@daily', default_args=default_args) as dag: task1 = BashOperator( task_id='task1', bash_command='echo "Hello, Airflow!"' ) task1
% docker-compose down % docker-compose up -d
再度ブラウザで「http://127.0.0.1:8080」を開いてログインします。
で、「my_dag」を実行してみます。
実行が完了したらログを見てみます。
「Hello Airflow!」と表出されています。新たに追加した DAG がきちんと機能しています。
[2023-05-23, 08:36:59 UTC] {subprocess.py:75} INFO - Running command: ['/bin/bash', '-c', 'echo "Hello, Airflow!"'] [2023-05-23, 08:36:59 UTC] {subprocess.py:86} INFO - Output: [2023-05-23, 08:36:59 UTC] {subprocess.py:93} INFO - Hello, Airflow! [2023-05-23, 08:36:59 UTC] {subprocess.py:97} INFO - Command exited with return code 0
Docker を勉強する上で、とりあえず何パターンか作業をしたいと思い、いくつかやってきました。
で、次は既にある程度作り込まれているものをコンテナに入れてみようと思い、過去に自分が作った物で企業との面談等でも触れていただくことの多い「地下ぺディア」を使ってやってみることにしました。
ゴールとしてはとりあえず Docker イメージを run してブラウザで 127.0.0.1:8000 を開けば地下ぺディアが使える様にします。
そもそも「地下ぺディア」とは、自然言語処理の技術の一つである形態素解析を使って、ウィキペディアの記事をカイジっぽい文体で表示する Web アプリです。
フレームワークに Django、形態素解析には CaboCha を使用しており、任意のウィキペディアページの HTML ソースを解析、HTML 要素を崩さずに文体を変更し HTTP レスポンスとして返す様になっています。
元々は「ウィキペディア記事を元に自由ミルクボーイの漫才を作れたら。。」と思い立ったものの難しそうだったのでひとまず地下ぺディアという形にしたという経緯があります。
元々地下ぺディアのファイル群があるディレクトリに Dockerfile を作成します。
ベースイメージとしてはこちらの記事で作成した、Python で CaboCha を使える様にしたものを使います。
FROM docker_nlp:1.0 # ファイルを全てコピーし、requirements.txt で pip install を実施 WORKDIR /app COPY . . RUN pip3 install -r requirements.txt # コンテナ外からのアクセスを可能にするため 0.0.0.0 番で runserver を実行 # 開発環境用の settings_dev.py を使用 CMD ["python3", "chikapedia/manage.py", "runserver", "0.0.0.0:8000","--settings=chikapedia.settings_dev"]
CMD の部分で Django の runserver を実行する様記述していますが、「0.0.0.0:8000」としてコンテナ外(つまりホストから)からのアクセスを受け付ける様にし、「–settings=chikapedia.settings_dev」で開発環境用の settings.py を使用できる様にしています。
あとはいつも通りです。
% docker build -t chikadocker:1.0 . % docker run --name chikapedia-docker -p 8000:8000 -it chikadocker:1.0 Watching for file changes with StatReloader Performing system checks... System check identified no issues (0 silenced). You have 18 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions. Run 'python manage.py migrate' to apply them. May 19, 2023 - 02:22:08 Django version 3.2.4, using settings 'chikapedia.settings_dev' Starting development server at http://0.0.0.0:8000/ Quit the server with CONTROL-C.
未 migrate のマイグレーションに関する警告が出ますが、地下ぺディアはデータベースを使わないので無視します。
run は無事完了し、Django のサーバーもコンテナ内「0.0.0.0:8000」で立ち上がりました。
果たして動くのか?ローカル PC のブラウザで 127.0.0.1:8000 にアクセスしてみます。
無事動きました!
バグ修正時の対応としては
といった流れで修正と確認を繰り返しました。
Docker の勉強がてら、Python での自然言語処理によく使われる CaboCha モジュールを使える Docker コンテナを作ったので手順を記しておきます。
イメージは Docker Hub のリポジトリに push してあります。
ちなみに CaboCha モジュールは単純に pip install で使える様なものではなく、条件付き確率場の自然言語処理向け実装である(CRF++)や、辞書ファイル(mecab-ipadic-neologd)などをインストールする必要があり面倒な印象です。
まず Dockerfile を作ります。というか今回ここが一番大事なところです。
以前↓の記事で VPS の Ubuntu に環境を構築したことがあるので、基本的にはその手順を流用しました。
そして ChatGPT の手を多分に借りました。
下記が Dockerfile の中身です。
# Dockerfile # Use Ubuntu 20.04 as a base FROM ubuntu:20.04 # Set environment variables ENV DEBIAN_FRONTEND=noninteractive # Update system packages RUN apt-get update && apt-get install -y \ build-essential \ mecab \ libmecab-dev \ mecab-ipadic \ git \ wget \ curl \ bzip2 \ python3 \ python3-pip \ sudo # Install mecab-ipadic-neologd WORKDIR /var/lib/mecab/dic RUN git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git RUN ./mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -n -y # Install CRF++ WORKDIR /root COPY CRF++-0.58.tar . RUN tar xvf CRF++-0.58.tar && \ cd CRF++-0.58 && \ ./configure && make && make install && \ ldconfig && \ rm ../CRF++-0.58.tar # Install CaboCha WORKDIR /root RUN FILE_ID=0B4y35FiV1wh7SDd1Q1dUQkZQaUU && \ FILE_NAME=cabocha-0.69.tar.bz2 && \ curl -sc /tmp/cookie "https://drive.google.com/uc?export=download&id=${FILE_ID}" > /dev/null && \ CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)" && \ curl -Lb /tmp/cookie "https://drive.google.com/uc?export=download&confirm=${CODE}&id=${FILE_ID}" -o ${FILE_NAME} && \ bzip2 -dc cabocha-0.69.tar.bz2 | tar xvf - && \ cd cabocha-0.69 && \ ./configure --with-mecab-config=`which mecab-config` --with-charset=UTF8 && \ make && make check && make install && \ ldconfig && \ cd python && python3 setup.py install # Install mecab-python3 RUN pip3 install mecab-python3 # Cleanup apt cache RUN apt-get clean && rm -rf /var/lib/apt/lists/* # Set default work directory WORKDIR /root CMD ["/bin/bash"]
上記 Dockerfile の中に「COPY CRF++-0.58.tar .」の記述があります。
このファイルはこちらのリンクから直接ダウンロードしておく必要があったので、ダウンロードして Dockerfile と同じディレクトリに配置しました。
% ls CRF++-0.58.tar Dockerfile
で、docker build でイメージを作成します。
% docker build -t python-cabocha:1.0 .
ここが成功すればあとはどうとでもなる気がします。
「docker run」でコンテナを作成するとそのままコンテナ内に入ります。
% docker run --name cabocha-python -it python-cabocha:1.0 root@21c443991ed9:~#
コンテナ内で Python を起動します。
root@21c443991ed9:~# python3
で、CaboCha を使ってみます。
>>> import CaboCha >>> sentence = 'エンゼルスの大谷翔平投手が「3番・DH」でスタメン出場。前日に続き4打数無安打と2試合連続ノーヒットとなった。' >>> c = CaboCha.Parser('-d /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd') >>> print(c.parseToString(sentence)) エンゼルスの-D 大谷翔平投手が---D 「3番・DH」で-D スタメン出場。---------D 前日に-D | 続き-----D 4打数無安打と---D 2試合連続ノーヒットと-D なった。 EOS >>>
無事使えました!
勉強がてら、Django を Docker イメージに入れてみます。
Django は最低限の部分のみで、ロケットの画面が表示できれば OK とします。
% python3 -m venv venv % source venv/bin/activate (venv) % pip install django
とりあえず最低限、あのロケットの画面の表示だけする様にします。
(venv) % django-admin startproject core (venv) % python core/manage.py runserver
(venv) % ls core venv
一旦 Django はここまで。
(venv) % pip freeze > requirements.txt (venv) % ls core requirements.txt venv
# requirements.txt asgiref==3.6.0 backports.zoneinfo==0.2.1 Django==4.2.1 sqlparse==0.4.4
Django と requirements.txt を作ったところで Docker の作業へ移っていきます。
今回は「python:3.8.3-slim-buster」のイメージをベースにして Dockerfile を作成します。
FROM python:3.8.3-slim-buster WORKDIR /app COPY requirements.txt . RUN pip3 install -r requirements.txt COPY . . # コンテナ外からのアクセスを可能にするため 0.0.0.0 番で runserver を実行 CMD ["python3", "manage.py", "runserver", "0.0.0.0:8000"]
「COPY . .」の部分で、ローカルの作業フォルダ配下を全てコンテナの作業フォルダ(/app)にコピーします。この部分で Django の関連ファイルも全てコピーされます。
(venv) % ls Dockerfile core requirements.txt venv
/venv 配下のみ、次の .dockerignore で除外設定をします。
「venv」配下をイメージに含めない様「.dockerignore」に追加します。
# .dockerignore venv
(venv) % docker build -t dockerdjango:1.0 .
(venv) % docker images REPOSITORY TAG IMAGE ID CREATED SIZE dockerdjango 1.0 9c4fe787bc1d 45 seconds ago 205MB
「docker run」を実行します。
(venv) % docker run --name dj_dk -p 8000:8000 dockerdjango:1.0
下記を実行してブラウザで「127.0.0.1:8000」へアクセスします。
とりあえずロケットの画面は表示されました。
「docker exec」でコンテナの中身を確認します。
(venv) % docker exec -it dj_dk bash root@40ac730cceba:/app# ls Dockerfile core requirements.txt
「.dockerignore」で指定した venv はきちんと除外されています。
Docker Hub で公開されているイメージを元に、自作の Python プログラムを含めて Docker イメージを作成する工程をメモしておきます。
作成したイメージからコンテナを作成、コンテナ内で Python プログラムを実行するところまでカバーしています。
前提:Mac で Docker Desktop をインストール済み
まずは下準備として Python 仮想環境をローカル環境に作り、コンテナ内で実行したい Python プログラムを用意します。
とりあえず手持ちのプログラム「amazon_scraping.py」を使いまわします。仮想環境と横並びで配置しています。
% ls amazon_scraping.py venv
Amazon の商品リストページの情報をスクレイピングして CSV ファイルにアウトプットするプログラムです。
# amazon_scraping.py from datetime import date from time import sleep import csv import requests from bs4 import BeautifulSoup domain_name = 'amazon.co.jp' search_term = 'iPhone 12' url = f'https://www.{domain_name}/s?k={search_term}'.replace(' ','+') urls = [] for i in range(1,2): urls.append(f'{url}&page={i}') headers = { 'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Safari/605.1.15', 'Host':f'www.{domain_name}' } # Request each URL, convert into bs4.BeautifulSoup soups = [] for url in urls: response = requests.get(url,headers=headers) soup = BeautifulSoup(response.content, 'html.parser') soups.append(soup) sleep(0.5) # Convert delivery date def format_delivery_date(date_string): if len(date_string) == 0: return None if date_string[-2:] == '曜日': date_string = date_string[:-3] if date_string[0:3] == '明日中' and date_string[-4:] == '1月/1': date_string = date_string.replace('明日中','2024/') elif date_string[0:3] == '明日中': date_string = date_string.replace('明日中','2023/') date_string = date_string.replace('月/','/') return date_string # Extract data from bs4.BeautifulSoup def scan_page(soup, page_num): products = [] for product in soup.select('.s-result-item'): asin = product['data-asin'] a_spacing_top_small = product.select_one('.a-spacing-top-small') a_section_list = product.select('.sg-row .a-section') for a_section in a_section_list: price_elem = a_section.select_one('.a-price .a-offscreen') if price_elem: price = int(price_elem.get_text().replace('¥', '').replace(',','')) continue delivery_date_by = a_section.select_one('span:-soup-contains("までにお届け")') if delivery_date_by: delivery_date = format_delivery_date(a_section.select('span')[1].text) continue if asin: products.append({'asin': asin, 'price': price, 'delivery_date': delivery_date, 'page_number': page_num}) return products for page_num, soup in enumerate(soups): dict_list = scan_page(soup, page_num+1) fieldnames = dict_list[0].keys() with open('output.csv', 'w', newline='') as csvfile: writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerows(dict_list) print('csv file created')
標準ではないモジュールとして、requests と beautifulsoup4 を pip でインストールします。
(venv) % pip install requests (venv) % pip install beautifulsoup4
次に requirements.txt を作ります。
% pip freeze > requirements.txt
(venv) % ls amazon_scraping.py requirements.txt venv
中身は下記の様になっています。
# requirements.txt beautifulsoup4==4.12.2 certifi==2023.5.7 charset-normalizer==3.1.0 idna==3.4 requests==2.30.0 soupsieve==2.4.1 urllib3==2.0.2
「venv」配下は必要ないため、イメージに含めない様「.dockerignore」に追加します。
# .dockerignore venv
とりあえず Python プログラムの下準備はここまでです。
仮想環境のディレクトリと横並びで Dockerfile というファイルを作成します。
ubuntu:20.04 のイメージを元に、Python および使用するモジュールをインストールする様記述します。
# Dockerfile FROM ubuntu:20.04 #apt の最新化の後 python と pip をインストール RUN apt update RUN apt install -y python3.9 RUN apt install -y python3-pip # 作業ディレクトリを /var に移動 WORKDIR /var # ローカル環境の amazon_scraping.py をコンテナへコピー COPY amazon_scraping.py . # ローカル環境の requirements.txt をコンテナへコピーし、中身を pip install COPY requirements.txt . RUN python3.9 -m pip install -r requirements.txt
中身は上記の通りで、ubuntu:20.04 のイメージを元に、ファイルのコピーやインストールをおこないます。
「ls」を実行すると下記の状態です。
(venv) % ls Dockerfile amazon_scraping.py requirements.txt venv
「docker build」コマンドを実行します。
% docker build -t docker_amzn:1.0 . [+] Building 239.4s (13/13) FINISHED => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 253B 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load metadata for docker.io/library/ubuntu:20.04 0.9s => [1/8] FROM docker.io/library/ubuntu:20.04@sha256:db8bf6f4fb351aa7a26e27ba2686cf35a6a409f65603e59d4c203e58387dc6b3 4.2s => => resolve docker.io/library/ubuntu:20.04@sha256:db8bf6f4fb351aa7a26e27ba2686cf35a6a409f65603e59d4c203e58387dc6b3 0.0s => => sha256:db8bf6f4fb351aa7a26e27ba2686cf35a6a409f65603e59d4c203e58387dc6b3 1.13kB / 1.13kB 0.0s => => sha256:b795f8e0caaaacad9859a9a38fe1c78154f8301fdaf0872eaf1520d66d9c0b98 424B / 424B 0.0s => => sha256:88bd6891718934e63638d9ca0ecee018e69b638270fe04990a310e5c78ab4a92 2.30kB / 2.30kB 0.0s => => sha256:ca1778b6935686ad781c27472c4668fc61ec3aeb85494f72deb1921892b9d39e 27.50MB / 27.50MB 2.9s => => extracting sha256:ca1778b6935686ad781c27472c4668fc61ec3aeb85494f72deb1921892b9d39e 0.9s => [internal] load build context 0.0s => => transferring context: 2.67kB 0.0s => [2/8] RUN apt update 62.4s => [3/8] RUN apt install -y python3.9 35.9s => [4/8] RUN apt install -y python3-pip 130.6s => [5/8] COPY requirements.txt . 0.0s => [6/8] RUN python3.9 -m pip install -r requirements.txt 2.7s => [7/8] WORKDIR /var 0.0s => [8/8] COPY /venv/amazon_scraping.py . 0.0s => exporting to image 2.6s => => exporting layers 2.6s => => writing image sha256:9f3dfca1f57b234294ed4666ea9d6dc05f7200cf30c6c10bbebf83834ae6e457 0.0s => => naming to docker.io/library/docker_amzn:1.0 0.0s %
数分かかりましたが無事完了。「docker images」コマンドで作成済みのイメージを確認できます。
% docker images REPOSITORY TAG IMAGE ID CREATED SIZE docker_amzn 1.0 9f3dfca1f57b 59 seconds ago 473MB
「docker run」コマンドで Docker イメージからコンテナを作成します。
コンテナ外からのアクセスはしないのでポートフォワーディング(-p オプション)は指定していません。
% docker run --name amzn_scraper -it -d docker_amzn:1.0 47caefa69121c3323c7379f448952003001817e937ffb3232d4564fce9b3c01c
% docker exec -it amzn_scraper bash root@47caefa69121:/var#
「ls」を実行すると、Dockerfile で COPY の記述をした amazon_scraping.py や requirements.txt がコンテナに存在することを確認できます。
root@cd3f7f913010:/var# ls amazon_scraping.py backups cache lib local lock log mail opt requirements.txt run spool tmp
そのままコンテナ内で Python プログラムを実行してみます。
root@cd3f7f913010:/var# python3.9 amazon_scraping.py csv file created
ファイルが作成された様です。
root@cd3f7f913010:/var# ls amazon_scraping.py backups cache lib local lock log mail opt output.csv requirements.txt run spool tmp
「head」コマンドで中身も確認。きちんと作成されている様です。
root@cd3f7f913010:/var# head output.csv asin,price,delivery_date,page_number B0BDHLR5WP,164800,,1 B0BDHYRRQX,134800,,1 B09M69W9KR,234801,,1 B09M68Y2HZ,162800,,1 B0928MGLCR,50025,,1 B0928LZ4HD,67980,2023/5/19,1 B08B9WMNSS,49490,,1 B0928L4D5H,92430,,1 B08B9GTC1T,78695,,1
# exit exit
% docker stop amzn_scraper amzn_scraper % docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 47caefa69121 docker_amzn:1.0 "/bin/bash" 13 minutes ago Exited (0) 2 seconds ago amzn_scraper
% docker rm amzn_scraper amzn_scraper % docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES