These are the configurations required to ingest Twitter feed directly into your data lake using Infoworks.

  • Step 1: Create a 'REST Generic' API Source
  • Step 2: Configure the source
  • Step 3: Create a Table
  • Step 4: Configure and Ingest


Step 1: Create a 'REST Generic' API Source


https://docs2x.infoworks.io/data-ingestion/generic-rest-api-ingestion#creating-generic-rest-api-source


Step 2: Configure the Source

  1. The Authorization key value is derived by
    • Concatenate "<consumer key>:<consumer secret>" (no quotes)
    • Encode it using base64 (simple python3 script attached)
    • Use “Basic <encoded string>” as the value for the Authorization key
  2. Content-Type key value is application/x-www-form-urlencoded;charset=UTF-8
  3. Save and Test connection


Step 3: Create a Table

  1. Add a new table definition
    1. Meta URL = “https://api.twitter.com/1.1/search/tweets.json

    2. Add a Request Header    
      • Key = Authorization
      • Value = “bearer $authtoken” (no quotes)
    3. Add a parameter eg: Key = q, Value = covid (more details here - https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets )
    4. Save and Crawl


Step 4: Configure the Table

  1. Choose Full Load
  2. Select Incremental Append Mode
  3. Choose Natural Keys as applicable
  4. Select Use Meta Url as Base Url
  5. Configure Pagination mechanism as “Next URI in response”
  6. Configure Next URL JSON Path

        Key = $['search_metadata']['next_results']

        Value = “https://api.twitter.com/1.1/search/tweets.json” (no quotes)