Python 多线程采集网页标题：附采集 https://www.xxxxx.com?list=1 并保存的代码示例

以下是一个使用 Python 的多线程来采集网页 `https://www.xxxxx.com?list=1`（这里你需要替换成真实有效的网址）并保存每个网页的示例代码： ```python import requests from bs4 import BeautifulSoup import threading # 定义一个函数来获取网页的 def get_title(url): try: # 发送 HTTP 请求 response = requests.get(url) # 检查响应状态码 response.raise_for_status() # 使用 BeautifulSoup 解析 HTML 内容 soup = BeautifulSoup(response.text, 'html.parser') # 获取网页 title = soup.title.string if soup.title else "No title found" print(f"Title of {url}: {title}") # 保存到文件 with open('titles.txt', 'a', encoding='utf-8') as file: file.write(f"{url}: {title}\n") except requests.RequestException as e: print(f"Error fetching {url}: {e}") # 定义要采集的网址列表 urls = ["https://www.xxxxx.com?list=1"] # 这里可以添加更多的网址 # 创建线程列表 threads = [] # 为每个网址创建一个线程 for url in urls: thread = threading.Thread(target=get_title, args=(url,)) threads.append(thread) thread.start() # 等待所有线程完成 for thread in threads: thread.join() print("All titles have been fetched and saved.") ``` 代码说明： 1. `get_title` 函数：该函数接受一个网址作为参数，发送 HTTP 请求获取网页内容，使用 `BeautifulSoup` 解析 HTML 内容，提取网页，并将保存到 `titles.txt` 文件中。 2. `urls` 列表：包含要采集的网址列表，你可以根据需要添加更多的网址。 3. 线程创建和启动：为每个网址创建一个线程，并启动线程。 4. 线程等待：使用 `join` 方法等待所有线程完成。注意事项： - 请将 `https://www.xxxxx.com?list=1` 替换为真实有效的网址。 - 确保你已经安装了 `requests` 和 `beautifulsoup4` 库，可以使用以下命令安装： ```sh pip install requests beautifulsoup4 ``` ######[AI问答 | 553点数解答 | 2025-11-10 09:58:59]

服务商

更多选项

快捷项

自定义