Getting Started with PyAirbyte (Beta)
PyAirbyte is a library that provides a set of utilities to use Airbyte connectors in Python. It is meant to be used in situations where setting up an Airbyte server or cloud account is not possible or desirable, for example in a Jupyter notebook or when iterating on early prototypes on a developer's workstation.
You can also check out this YouTube video on how to get started with PyAirbyte!
Installation
pip install airbyte
Or during the beta, you may want to install the latest from from source with:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git'
Usage
Data can be extracted from sources and loaded into caches:
import airbyte as ab
source = ab.get_source(
"source-faker",
config={"count": 5_000},
install_if_missing=True,
)
source.check()
source.select_all_streams()
result = source.read()
for name, records in result.streams.items():
print(f"Stream {name}: {len(list(records))} records")
Quickstarts
API Reference
For details on specific classes and methods, please refer to our PyAirbyte API Reference.
Architecture
PyAirbyte is a python library that can be run in any context that supports Python >=3.9. It contains the following main components:
- Source: A source object is using a Python connector and includes a configuration object. The configuration object is a dictionary that contains the configuration of the connector, like authentication or connection modalities. The source object is used to read data from the connector.
- Cache: Data can be read directly from the source object. However, it is recommended to use a cache object to store the data. The cache object allows to temporarily store records from the source in a SQL database like a local DuckDB file or a Postgres or Snowflake instance.
- Result: An object holding the records from a read operation on a source. It allows quick access to the records of each synced stream via the used cache object. Data can be accessed as a list of records, a Pandas DataFrame or via SQLAlchemy queries.
Available connectors
The following connectors are available:
- ActiveCampaign
- Adjust
- Aha
- Aircall
- Airtable
- Alpha Vantage
- Amazon Ads
- Amazon Seller Partner
- Amazon SQS
- Amplitude
- Apify Dataset
- Appfollow
- Apple Search Ads
- AppsFlyer
- Asana
- Ashby
- Auth0
- AWS CloudTrail
- Azure Blob Storage
- Azure Table Storage
- Babelforce
- BambooHR
- Bing Ads
- Braintree
- Braze
- Breezometer
- CallRail
- Captain Data
- Cart.com
- Chargebee
- Chargify
- Chartmogul
- ClickUp
- Clockify
- Close.com
- Coda
- Coin API
- CoinGecko Coins
- CoinMarketCap
- Commcare
- Commercetools
- ConfigCat
- Confluence
- ConvertKit
- Convex
- Copper
- Datadog
- Datascope
- Delighted
- Dixa
- Dockerhub
- Dremio
- Drift
- EmailOctopus
- Everhour
- Exchange Rates Api
- Facebook Marketing
- Facebook Pages
- Sample Data (Faker)
- Fastbill
- Fauna
- File (CSV, JSON, Excel, Feather, Parquet)
- Firebase Realtime Database
- Firebolt
- Flexport
- Freshcaller
- Freshdesk
- Freshsales
- Freshservice
- Fullstory
- Gainsight Px
- GCS
- Genesys
- Lago
- GitHub
- Gitlab
- Glassfrog
- GoCardless
- Gong
- Google Ads
- Google Analytics 4 (GA4)
- Google Analytics (Universal Analytics)
- Google Directory
- Google Drive
- Google PageSpeed Insights
- Google Search Console
- Google Sheets
- Google Webfonts
- Greenhouse
- Gridly
- Gutendex
- Harvest
- Hellobaton
- Hubplanner
- HubSpot
- Insightly
- Instatus
- Intercom
- Intruder
- IP2Whois
- Iterable
- Jira
- K6 Cloud
- Klarna
- Klaus Api
- Klaviyo
- Kyriba
- KYVE
- LaunchDarkly
- Lemlist
- Lever Hiring
- LinkedIn Ads
- LinkedIn Pages
- Linnworks
- Lokalise
- Looker
- Mailchimp
- MailerLite
- MailerSend
- Mailgun
- Mailjet Mail
- Mailjet SMS
- Marketo
- Merge
- Metabase
- Microsoft Dataverse
- Microsoft OneDrive
- Microsoft teams
- Mixpanel
- Monday
- My Hours
- n8n
- Nasa
- Netsuite
- News API
- Newsdata
- Notion
- New York Times
- Okta
- Omnisend
- OneSignal
- Open Exchange Rates
- Openweather
- Opsgenie
- Orb
- Orbit
- Oura
- Outbrain Amplify
- Outreach
- Pardot
- PartnerStack
- Paypal Transaction
- Paystack
- Pendo
- PersistIq
- Pexels API
- Pipedrive
- Pivotal Tracker
- Plaid
- Plausible
- PokeAPI
- Polygon Stock API
- PostHog
- Postmark App
- PrestaShop
- Primetric
- Public Apis
- Punk API
- PyPI
- Qualaroo
- QuickBooks
- Railz
- RD Station Marketing
- Recharge
- Recreation
- Recruitee
- Recurly
- Reply.io
- Retently
- Ringcentral
- RKI Covid
- Rocket.chat
- RSS
- S3
- Salesforce
- SalesLoft
- SAP Fieldglass
- Secoda
- Sendgrid
- Sendinblue
- Senseforce
- Sentry
- Serpstat
- SFTP Bulk
- Shopify
- Shortio
- Slack
- Smaily
- SmartEngage
- Smartsheets
- Snapchat Marketing
- Sonar Cloud
- SpaceX API
- Square
- Statuspage
- Strava
- Stripe
- SurveySparrow
- SurveyCTO
- SurveyMonkey
- Tempo
- The Guardian API
- TikTok Marketing
- Timely
- TMDb
- Todoist
- Toggl
- TPLcentral
- Trello
- TrustPilot
- TVMaze Schedule
- Twilio Taskrouter
- Twilio
- Tyntec SMS
- Typeform
- US Census
- Vantage
- Visma Economic
- Vitally
- Waiteraid
- Weatherstack
- Webflow
- Whisky Hunter
- Wikipedia Pageviews
- WooCommerce
- Workable
- WorkRamp
- Wrike
- Xero
- xkcd
- Yahoo Finance Price
- Yandex Metrica
- Yotpo
- Younium
- YouTube Analytics
- Zapier Supported Storage
- Zendesk Chat
- Zendesk Sunshine
- Zendesk Support
- Zendesk Talk
- Zenefits
- Zenloop
- ZohoCRM
LangChain integration
For those interested in using PyAirbyte to drive your LLM use cases, we provide two ways to integrate with LangChain:
-
LangChain native integration: This approach requires you to utilize the
langchain-airbyte
integration package. Refer to LangChain Docs or watch this YouTube video to get started. -
PyAirbyte-centric integration: You can also directly use PyAirbyte to create documents. With this approach, you do not need to import
langchain-airbyte
. Refer to PyAirbyte Document Creation Demo to get started.