。以下の方法があります:","クランチベース","会社概要","素晴らしいサポートをありがとう!","クイックリンク","アフィリエイト・プログラム","プレミアム","ProxyScrape プレミアムトライアル","プロキシの種類","代理国","プロキシの使用例","トップ・レジデンシャル・プロキシ・プロバイダー","重要","クッキーポリシー","免責事項","プライバシーポリシー","ご利用条件","倫理的プロキシ","ソーシャルメディア","フェイスブック","LinkedIn","ツイッター","テレグラム","ディスコード","\n © Copyright 2025 - Thib BV | Brugstraat 18 | 2812 Mechelen | ベルギー | VAT BE 0749 716 760\n"]}
ウェブスクレイピングは、インターネット全体からデータを収集するための不可欠なツールとなっており、データアナリスト、技術愛好家、そして企業が情報に基づいた意思決定を行うための力となっている。しかし、データの抽出は最初のステップに過ぎません。その可能性を最大限に引き出すには、データを適切なフォーマットに効率的にエクスポートする必要がある。それがスプレッドシート用のCSVファイルであれ、API用のJSONであれ、大規模なストレージと分析用のデータベースであれ。
このブログでは、ウェブスクレイピングされたデータのエクスポートの要点について説明します。CSVファイルやJSONファイルの扱い方、ウェブスクレイピングされたデータをデータベースと統合する方法、そしてデータ管理を最大限に活用する方法をステップバイステップで学びます。
Before diving into the script, let’s understand the dataset and workflow that we’ll use to demonstrate the data-saving process.
We’ll be scraping data from the website Books to Scrape, which provides a list of books along with their:
This website is designed for practice purposes, making it an ideal choice for showcasing web scraping techniques.
Here’s the process we’ll follow:
requests
そして BeautifulSoup
libraries to extract the book details from the website.To run the script, you’ll need the following Python libraries:
Install these libraries using
. Run the following command in your terminal: pip
pipinstall requestsbeautifulsoup4 pandas
Here’s the Python script to scrape the data from the website and store it in a Pandas DataFrame:
import requests
from bs4 import BeautifulSoup
import pandas as pd
# Scrape data from the website
def scrape_books():
url = "https://books.toscrape.com/"
response = requests.get(url)
if response.status_code != 200:
raise Exception("Failed to load page")
soup = BeautifulSoup(response.content, "html.parser")
books = []
# Extract book data
for article in soup.find_all("article", class_="product_pod"):
title = article.h3.a["title"]
price = article.find("p", class_="price_color").text.strip()
availability = article.find("p", class_="instock availability").text.strip()
books.append({"Title": title, "Price": price, "Availability": availability})
# Convert to DataFrame
books_df = pd.DataFrame(books)
return books_df
# Main execution
if __name__ == "__main__":
print("Scraping data...")
books_df = scrape_books()
print("Data scraped successfully!")
print(books_df)
The table we will use to demonstrate the data-saving process is structured as follows:
Title | 価格 | 空室状況 |
A Light in the Attic | £51.77 | In stock |
Tipping the Velvet | £53.74 | In stock |
Soumission | £50.10 | In stock |
Sharp Objects | £47.82 | In stock |
Sapiens: A Brief History of Humankind | £54.23 | NA |
The Requiem Red | £22.65 | In stock |
... | ... | .... |
Use the
method from Pandas: to_csv
def save_to_csv(dataframe, filename="books.csv"):
dataframe.to_csv(filename, index=False)
print(f"Data saved to {filename}")
Code Explanation:
filename
: Specifies the name of the output file.index=False
: Ensures the index column is not included in the CSV file. Use the
method from Pandas: to_json
def save_to_json(dataframe, filename="books.json"):
dataframe.to_json(filename, orient="records", indent=4)
print(f"Data saved to {filename}")
Code Explanation:
orient="records"
: : Each row in the DataFrame is converted into a JSON object.indent=4
: Formats the JSON for better readability. Use the
method from Pandas と SQLite: to_sql
import sqlite3
def save_to_database(dataframe, database_name="books.db"):
conn = sqlite3.connect(database_name)
dataframe.to_sql("books", conn, if_exists="replace", index=False)
conn.close()
print(f"Data saved to {database_name} database")
Code Explanation:
sqlite3.connect(database_name)
: Connects to the SQLite database (creates it if it doesn’t exist).to_sql("books", conn, if_exists="replace", index=False)
:While formats like CSV or JSON work well for smaller projects, databases offer superior performance, query optimization, and data integrity when handling larger datasets. The seamless integration of Pandas with SQLite makes it simple to store, retrieve, and manipulate data efficiently. Whether you're building a data pipeline or a complete application, understanding how to leverage databases will greatly enhance your ability to work with data effectively. Start using these tools today to streamline your data workflows and unlock new possibilities!