Prerequisites
Before proceeding, ensure the following prerequisites are met:- Install MindsDB locally via Docker or use MindsDB Cloud.
- To connect Web Crawler to MindsDB, install the required dependencies following this instruction.
- Install or ensure access to Web Crawler.
Connection
This handler does not require any connection parameters. Here is how to initialize a web crawler:Usage
Get Websites Content
Here is how to get the content ofdocs.mindsdb.com:
Get PDF Content
MindsDB accepts file uploads ofcsv, xlsx, xls, sheet, json, and parquet. However, you can utilize the web crawler to fetch data from pdf files.
pdf file stored in Amazon S3.