DeepSeekAuditChecker：用多线程高效处理CSV文件并调用API进行审核

: str, output_file_path: str, num_threads: int = 4) -> None: """ 多线程处理CSV文件 Args: csv_file_path: 输入CSV文件路径 output_file_path: 输出CSV文件路径 num_threads: 线程数量 """ # 读取CSV文件 df = pd.read_csv(csv_file_path) self.total_count = len(df) self.processed_count = 0 # 准备数据 data = [(i, row['original_office'], row['matched_uniformname']) for i, row in df.iterrows()] # 创建线程池 with ThreadPoolExecutor(max_workers=num_threads) as executor: results = list(executor.map(self.process_single_row, data)) # 更新结果到DataFrame for index, result in results: df.at[index, 'is_same'] = result # 保存结果 df.to_csv(output_file_path, index=False) print(f"处理完成，结果已保存到: {output_file_path}") # 使用示例 if __name__ == "__main__": import threading import time import requests import pandas as pd from concurrent.futures import ThreadPoolExecutor checker = DeepSeekAuditChecker(api_key="your_api_key_here") checker.process_csv_multithreaded( csv_file_path="input.csv", output_file_path="output.csv", num_threads=4 ) 代码说明： 1. 类初始化需要DeepSeek API密钥 2. call_deepseek_api方法构建prompt并调用API 3. process_single_row方法处理单行数据 4. process_csv_multithreaded方法多线程处理整个CSV文件 5. 结果会新增is_same列，1表示是同一家事务所，0表示不是注意事项： 1. 需要安装requests和pandas库 2. API调用有频率限制，建议设置适当延迟 3. 多线程打印使用锁保证线程安全 4. 空值会自动跳过并标记为0 ######[AI问答 | 471点数解答 | 2025-06-01 00:22:24]

服务商

更多选项

快捷项

自定义