ChatGPT를 이용해서 웹 크롤링 해보기
ChatGPT를 이용해서 웹크롤링 해보기
google에서 검색한 후에 이미지들을 다운받는 방법.
–> google 크롤링이 막혔는지.. API키랑 권한이 있어야 한다고 한다 ㅜㅜ
naver에서 검색해서 다운로드 시도
- 
    naver 검색한 페이지 링크를 chatgpt 에 넣고 
 
- 
    해당 코드(클롤링후 저장해 달라는)를 chatgpt를 이용해서 python으로 생성한 후에 
 
import os
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse, parse_qs
# Create a directory to store the images
image_dir = "images"
if not os.path.exists(image_dir):
    os.makedirs(image_dir)
# Define the URL of the Naver search page
url = "https://search.naver.com/search.naver?where=image&sm=tab_jum&query=%EA%B7%B8%EB%A6%BC"
# Send a GET request to the Naver search page
response = requests.get(url)
# Parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")
# Find all image tags on the page
image_tags = soup.find_all("img")
# Loop through each image tag and download the image
for i, tag in enumerate(image_tags):
    # Get the image URL
    image_url = tag["src"]
    # Extract the image file name
    parsed_url = urlparse(image_url)
    query_params = parse_qs(parsed_url.query)
    file_name = query_params["fname"][0]
    # Set the file path to save the image
    file_path = os.path.join(image_dir, file_name)
    # Download the image
    response = requests.get(image_url)
    with open(file_path, "wb") as file:
        file.write(response.content)
    print(f"Downloaded image {i+1}/{len(image_tags)}")
print("All images downloaded successfully!")
- colab에 접속해서 jupyter notebook에서 동작을 시켜서
 에러가 난다. ㅜㅜ
 모.. 다시 해결해 달라고 하고..
 
import os
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse, parse_qs
# Create a directory to store the images
image_dir = "images"
if not os.path.exists(image_dir):
    os.makedirs(image_dir)
# Define the URL of the Naver search page
url = "https://search.naver.com/search.naver?where=image&sm=tab_jum&query=%EA%B7%B8%EB%A6%BC"
# Send a GET request to the Naver search page
response = requests.get(url)
# Parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")
# Find all image tags on the page
image_tags = soup.find_all("img")
# Loop through each image tag and download the image
for i, tag in enumerate(image_tags):
    # Get the image URL
    image_url = tag["src"]
    # Set the file path to save the image
    file_name = f"image_{i+1}.jpg"
    file_path = os.path.join(image_dir, file_name)
    # Download the image
    response = requests.get(image_url)
    with open(file_path, "wb") as file:
        file.write(response.content)
    print(f"Downloaded image {i+1}/{len(image_tags)}")
print("All images downloaded successfully!")
- 이미지가 저장된것을 확인할 수 있다.