15 49.0138 8.38624 1 0 4000 1 https://wiese.xyz 300 0
theme-sticky-logo-alt

How to scrape Instagram with Python

This example will show you how you get the profile from instagram, this is only available for public account. Accounts with that a limited or require approval you can’t get the data from.

For this one to work we need two imports:

import urllib.request
import json

We need urllib for making the http requests, and json for reading the JSON.

We then make a function that will parse the web pages HTML, so we can extract the account information:

def extract_instagram_data(html):
    split1 = html.split('window._sharedData =')
    if len(split1) != 0:
        split2 = split1[1].split('"};')
        data = json.loads(split2[0]+'"}')
        return data

In the previous example i used BeautifulSoup, but this example is more easy with raw HTML.

For getting the HTML we will do a simple function to retrieve the HTML.

def get_html(url):
    user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    request = urllib.request.Request(url, headers={'User-Agent': user_agent})
    response = urllib.request.urlopen(request)
    html = response.read().decode('UTF-8')
    return html

For making everything is very easy, three lines of code:

For this example i have used my own Instagram account again:

url = 'https://www.instagram.com/mr.wiese/'

insta = extract_instagram_data(get_html(url))
print(insta)

This one will something like this to the console when running:

{'config': {'csrf_token': 'xxxxxx', 'viewer': None, 'viewerId': None}, 'country_code': 'US', 'language_code': 'en', 'locale': 'en_US', 'entry_data': {'ProfilePage': [{......

Getting the account information is very simple and takes about 5 minuts from start to running. Getting all the pictures from the account is much more tricky, but is can still be done within 10 minuts.

Here is the complete code:

# Import libraries
import urllib.request
import json

def extract_instagram_data(html):
    split1 = html.split('window._sharedData =')
    if len(split1) != 0:
        split2 = split1[1].split('"};')
        data = json.loads(split2[0]+'"}')
        return data

def get_html(url):
    user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    request = urllib.request.Request(url, headers={'User-Agent': user_agent})
    response = urllib.request.urlopen(request)
    html = response.read().decode('UTF-8')
    return html

url = 'https://www.instagram.com/mr.wiese/'
insta = extract_instagram_data(get_html(url))
print(insta)