[Selenium] chromedriver headless 옵션에 대하여

chromedriver 사용 시 headless 옵션을 사용하면 CPU 및 메모리 소비에 대한 문제가 있다.

확실히 크롬을 사용하다보면 메모리를 많이 잡아먹는다. (ex.확장프로그램 등)
탭을 몇 개만 실행해도 가끔 먹통이 되는 상황이 생긴다.

최적화 방법

사용자 지정 프록시 또는 C++ ProtocolHandlers를 사용하여 스텁 1x1 픽셀 이미지를 반환하거나 완전히 차단
memory-infra를 이용하여 메모리를 많이 소비하는 부분 확인
chromium은 항상 사용가능한 리소스를 최대한 많이 사용하므로, 효과적으로 리소스를 제한하려면 cgroups 사용을 확인해야한다.

cgroups(control groups의 약자)는 프로세스들의 자원의 사용(CPU, 메모리, 디스크 입출력, 네트워크 등)을 제한하고 격리시키는 리눅스 커널 기능(출처: 위키백과)

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/ch01

Chapter 1. Introduction to Control Groups (Cgroups) Red Hat Enterprise Linux 6 | Red Hat Customer Portal

Access Red Hat’s knowledge, guidance, and support through your subscription.

access.redhat.com

memory-infra

chrome이 시스템에서 메모리가 사용되는 위치를 이해할 수 있도록하는 메모리 측정 도구

메모리를 많이 소비하는 요소를 찾는데 도움이 된다.

https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/

docs/memory-infra - chromium/src - Git at Google

MemoryInfra MemoryInfra is a timeline-based profiling system integrated in chrome://tracing. It aims at creating Chrome-scale memory measurement tooling so that on any Chrome in the world --- desktop, mobile, Chrome OS or any other --- with the click of a

chromium.googlesource.com

Headless 실행하지않기

가능하면 headless브라우저를 실행하지않는다. 예측할 수 없고 리소스를 많이 잡아먹는다. http 요청을 통해 데이터를 가져오는 것이 목표일 경우, Node API를 제공하는 라이브러리가 있다.

https://www.zenrows.com/blog/javascript-web-crawler-nodejs#is-javascript-good-for-web-crawling

JavaScript Web Crawler with Node.js: A Step-By-Step Tutorial - ZenRows

Build a web crawler in JavaScript and Node.js. Learn how to crawl any website from the basics to the best practices.

www.zenrows.com

https://kim-oriental.tistory.com/27

[Node.js] Express Rest API + Puppeteer 웹크롤링 서비스 VSCode Docker 이미지 생성 및 실행

안녕하세요, 이전 포스팅들에서 Node.js Puppeteer 웹크롤링을 이용하여 데이터를 가져오고, 가져온 데이터를 Express를 이용하여 전달하는 Rest API 서비스를 개발하였습니다. 여러 포스팅으로 올려서

kim-oriental.tistory.com

https://jung-story.tistory.com/100

Node.js (Node.js 를 이용한 웹 크롤링 하기 REST API )

개요 이번에는 Node.js를 사용하여 특정 웹사이트의 정보를 크롤링하는 방법을 알아보도록 하겠습니다. 설명 저는 코로나 사이트에 들어가서 총 확진자의 수를 가져오도록 하겠습니다. http://ncov.m

jung-story.tistory.com

Docker 사용하기

docker는 사용 가능한 리소스의 양을 제한하고 샌드박스화할 수 있으므로 dockerfile을 직접 만드는게 좋다.

https://hub.docker.com/r/justinribeiro/chrome-headless/

Docker

hub.docker.com

참고

https://stackoverflow.com/questions/50701824/limit-chrome-headless-cpu-and-memory-usage

Limit chrome headless CPU and memory usage

I am using selenium to run chrome headless with the following command: system "LC_ALL=C google-chrome --headless --enable-logging --hide-scrollbars --remote-debugging-port=#{debug_port} --remote-

stackoverflow.com

https://jhyeok.com/python-selenium-tip/

Python에서 Selenium으로 크롤러를 만들 때 팁

Windows에서 Selenium을 이용해서 Chrome으로 웹 스크래핑(크롤링)을 개발하고 몇 개월 간 운영하면서 겪은 문제들과 고려해야 할 점을 설명하려고 한다. 이 글에서는 실제로 도움이 될 만한 것들을 위

jhyeok.com

블로그에서 정말 많은 도움이 되었다.. 오늘 새로운 사실을 많이 공부하고 갑니다.🙏🏻

'Development > Python' 카테고리의 다른 글

[Python] centos6/7에 python3.7.9 설치 하기 (0)	2023.01.30
[centOS7] Python 버전 업그레이드하기 (0)	2023.01.26
[Python] 튜플, 리스트, 딕셔너리 차이 (0)	2023.01.25
파이썬 for _ in에서 언더바(_)란? (0)	2023.01.15
[selenium] 요소 선택이 안될 경우 (2)	2022.12.27

최적화 방법

memory-infra

'Development > Python' 카테고리의 다른 글

티스토리툴바