猫になりたい

コンサルのデータ分析屋、計量経済とか機械学習をやっています。pyてょnは3.7を使ってマスコレルウィンストングリーン。

How to read csv and excel file on pandas dataframe from website

Motivation

Trying to use the Japanese Goverment statistics through an api I found it is useless due to the format and datasets which are provided. Instead I decided to read a csv and xls file directly from a website and load them on a pnadas dataframe. Below are the codes.

Example

csv file

The sample url for csv is below
"http://www.e-stat.go.jp/SG1/estat/Csvdl.do?sinfid=000012460662"
which is the url of a csv which contains a population in each prefecture of Japan. You can get a related csv files from 統計表一覧 政府統計の総合窓口 GL08020103 .(There seems no English websites unfortunately.)

Using the read_csv function below you will be abel to read a csv file on a pandas dataframe.

import pandas as pd

url = "http://www.e-stat.go.jp/SG1/estat/Csvdl.do?sinfid=000012460662"

#Read csv function
def read_csv(url):
    print(url)
    res = urllib.request.urlopen(url)
    res=res.read().decode('shift-jis')
    df = pd.read_csv(StringIO( res) )
    return df

#Run
read_csv(url)

Result f:id:shikiponn:20160619004604p:plain

Excel file(xls )

Next, from 統計表一覧 政府統計の総合窓口 GL08020103
I will get Population and Household data.

Similarly to the read_csv function we can make read_xls function as below.

url = "http://www.e-stat.go.jp/SG1/estat/GL08020103.do?_xlsDownload_&fileId=000003562952&releaseCount=1"

#Read xls function
def read_xls(url):
    print(url)
    res = urllib.request.urlopen(url)
    f = pd.ExcelFile(res)
    df = f.parse()
    return df

#Run
read_xls(url)

Result f:id:shikiponn:20160619190234p:plain