f:id:chayarokurokuro:20210728065154j:plain

今回は↑のように、日付の列を結合したり分割したりする方法です。

【実行環境】

Android
Termux
Python 3.9.6
Jupyter Notebook 6.4.0
Pandas 1.2.5

年・月・日で3列を1列の日付型にまとめる

データフレームの列が年月日で個別なものを、1つの列で表示させたい。
サンプルのデータフレームを作ってやり方を見てみます。

☆ サンプルの生成

import numpy as np
import pandas as pd


# サンプルのデータフレーム生成
date = pd.DataFrame({
    "年":np.array(2021).repeat(10),
    "月":np.array(1).repeat(10),
    "日":np.arange(1,11),
    "何かの値":np.random.randint(0,101,10)
})

date

	年	月	日	何かの値
0	2021	1	1	81
1	2021	1	2	25
2	2021	1	3	99
3	2021	1	4	21
4	2021	1	5	95
5	2021	1	6	79
6	2021	1	7	74
7	2021	1	8	63
8	2021	1	9	99
9	2021	1	10	30

こんな風になったExcelファイルはよくありますよね。このデータフレームの左の3列を、1つの列にまとめたい。

pd.to_datetime()を使う。

pd.to_datetime()で日付に変換

# 各列を日付型に変換し、まとめて日付列として追加する
d = pd.to_datetime({
    "year":date['年'],
    "month":date['月'],
    "day":date['日']
})

print(type(d)) # 型確認

date["日付"] = d # 置換

date

<class 'pandas.core.series.Series'>

	年	月	日	何かの値	日付
0	2021	1	1	81	2021-01-01
1	2021	1	2	25	2021-01-02
2	2021	1	3	99	2021-01-03
3	2021	1	4	21	2021-01-04
4	2021	1	5	95	2021-01-05
5	2021	1	6	79	2021-01-06
6	2021	1	7	74	2021-01-07
7	2021	1	8	63	2021-01-08
8	2021	1	9	99	2021-01-09
9	2021	1	10	30	2021-01-10

pd.to_datetime()で日付型に変更したものはSeriesになっています。

注意
pd.to_datetime()の引数にある辞書のキーは*year,month,dayにしないと日付型になりません。試しにやってみます。

失敗例 : 辞書のキーがyear,month,dayになっていない。

d_miss = pd.to_datetime({
    "年":date['年'],
    "月":date['月'],
    "日":date['日']
})

d_miss

出力は以下のようなエラー

ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing

上手く変換できた方のデータフレームの型を見ておきます。

# 型や欠損値などの確認
date.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   年       10 non-null     int64         
 1   月       10 non-null     int64         
 2   日       10 non-null     int64         
 3   何かの値    10 non-null     int64         
 4   日付      10 non-null     datetime64[ns]
dtypes: datetime64[ns](1), int64(4)
memory usage: 528.0 bytes

日付列はdatetime型、その他の列はint型。

pd.to_datetime()の引数を辞書以外で行うこともできます。

# 日付に直したい3行を抽出
date.iloc[:,:3]

	年	月	日
0	2021	1	1
1	2021	1	2
2	2021	1	3
3	2021	1	4
4	2021	1	5
5	2021	1	6
6	2021	1	7
7	2021	1	8
8	2021	1	9
9	2021	1	10

# まずはカラム名を変える
df_date = pd.DataFrame(
    date.iloc[1:, :3].values,
    columns=['year','month','day']
)

df_date

	year	month	day
0	2021	1	2
1	2021	1	3
2	2021	1	4
3	2021	1	5
4	2021	1	6
5	2021	1	7
6	2021	1	8
7	2021	1	9
8	2021	1	10

# 日付に変換
pd.to_datetime(df_date)

0   2021-01-02
1   2021-01-03
2   2021-01-04
3   2021-01-05
4   2021-01-06
5   2021-01-07
6   2021-01-08
7   2021-01-09
8   2021-01-10
dtype: datetime64[ns]

結局この場合でも、カラム名をyear,month,dayにする必要があります。

df.rename()などでカラム名を変えてもよい。

不要な列の削除

不要な列を削除する。引数のaxis=1で列削除を指定している。これを省くとKey Error : not found in axisとエラーが出る。

date.head(3)

	年	月	日	何かの値	日付
0	2021	1	1	81	2021-01-01
1	2021	1	2	25	2021-01-02
2	2021	1	3	99	2021-01-03

# 不要な列を削除し、変数に代入
date_new = date.drop(["年","月","日"] ,axis=1)
date_new

	何かの値	日付
0	81	2021-01-01
1	25	2021-01-02
2	99	2021-01-03
3	21	2021-01-04
4	95	2021-01-05
5	79	2021-01-06
6	74	2021-01-07
7	63	2021-01-08
8	99	2021-01-09
9	30	2021-01-10

日付列が左の方が良いので、列を入れ替えます。

# 列の入れ替え
date_new = date_new.iloc[:, [1,0]]
date_new

	日付	何かの値
0	2021-01-01	81
1	2021-01-02	25
2	2021-01-03	99
3	2021-01-04	21
4	2021-01-05	95
5	2021-01-06	79
6	2021-01-07	74
7	2021-01-08	63
8	2021-01-09	99
9	2021-01-10	30

または新規にデータフレームを作成

新規に「日付」列と「何かの値」列を付けたデータフレームを作れば、列の削除はしなくて済む。

pd.DataFrame({
    '日付':d,
    '何かの値':date['何かの値']
})

	日付	何かの値
0	2021-01-01	81
1	2021-01-02	25
2	2021-01-03	99
3	2021-01-04	21
4	2021-01-05	95
5	2021-01-06	79
6	2021-01-07	74
7	2021-01-08	63
8	2021-01-09	99
9	2021-01-10	30

時刻まで入っている場合

元のデータに「年」「月」「日」「時間」「分」「秒」まで入っている場合をやってみる。やり方は同じ。

まずは列を分けたデータを生成してから。

# サンプルのデータフレーム生成
dt = pd.DataFrame({
    "年":np.array(2021).repeat(10),
    "月":np.array(1).repeat(10),
    "日":np.arange(1,11),
    '時間':np.random.randint(1,24,10),
    '分':np.random.randint(0,60,10),
    '秒':np.random.randint(0,60,10),
    "何かの値":np.random.randint(0,101,10)
})

dt

	年	月	日	時間	分	秒	何かの値
0	2021	1	1	12	13	36	95
1	2021	1	2	11	10	45	23
2	2021	1	3	3	28	17	30
3	2021	1	4	17	35	51	67
4	2021	1	5	7	5	33	97
5	2021	1	6	2	16	50	45
6	2021	1	7	15	10	48	74
7	2021	1	8	1	6	7	65
8	2021	1	9	7	57	8	57
9	2021	1	10	21	59	7	39

あとは、それぞれの列を変換。

# 各列を日付型に変換し、まとめて日付列として追加する
d_t = pd.to_datetime({
    "year":dt['年'],
    "month":dt['月'],
    "day":dt['日'],
    "hour":dt['時間'],
    "minute":dt['分'],
    "second":dt['秒']
})

print(type(d_t)) # 型確認

# 表示
d_t

<class 'pandas.core.series.Series'>





0   2021-01-01 12:13:36
1   2021-01-02 11:10:45
2   2021-01-03 03:28:17
3   2021-01-04 17:35:51
4   2021-01-05 07:05:33
5   2021-01-06 02:16:50
6   2021-01-07 15:10:48
7   2021-01-08 01:06:07
8   2021-01-09 07:57:08
9   2021-01-10 21:59:07
dtype: datetime64[ns]

データフレームに変換すると…

# データフレーム化
pd.DataFrame({
    '日付':d_t,
    '何かの値':np.arange(len(d_t))
})

	日付	何かの値
0	2021-01-01 12:13:36	0
1	2021-01-02 11:10:45	1
2	2021-01-03 03:28:17	2
3	2021-01-04 17:35:51	3
4	2021-01-05 07:05:33	4
5	2021-01-06 02:16:50	5
6	2021-01-07 15:10:48	6
7	2021-01-08 01:06:07	7
8	2021-01-09 07:57:08	8
9	2021-01-10 21:59:07	9

またはrename()でカラム名を変換して、

# カラム名の変換
dt_df = dt.iloc[:, :-2].rename(
    columns={
        '年':'year',
        '月':'month',
        '日':'day',
        '時間':'hour',
        '分':'minute',
        '秒':'second'
    })

dt_df.head(2)

	year	month	day	hour	minute
0	2021	1	1	12	13
1	2021	1	2	11	10

# カラム名を変えたデータフレームをまるごと日付型に
pd.to_datetime(dt_df)

0   2021-01-01 12:13:00
1   2021-01-02 11:10:00
2   2021-01-03 03:28:00
3   2021-01-04 17:35:00
4   2021-01-05 07:05:00
5   2021-01-06 02:16:00
6   2021-01-07 15:10:00
7   2021-01-08 01:06:00
8   2021-01-09 07:57:00
9   2021-01-10 21:59:00
dtype: datetime64[ns]

rename()の使い方は以下

# renameの使い方はコマンドで
?pd.DataFrame.rename

#または
#help(pd.DataFrame.rename)

日付型1列を複数列に分割するには

今までとは逆に、分割するにはどうすればいいでしょうか。各属性は以下の方法で取り出せる。

# 日付型データを1つ抽出
d_t0 = d_t[0]
print(d_t0)

print('-'*20)


print('年', d_t0.year)
print('月', d_t0.month)
print('日', d_t0.day)
print('時', d_t0.hour)
print('分', d_t0.minute)
print('秒', d_t0.second)

2021-01-01 12:13:36
--------------------
年 2021
月 1
日 1
時 12
分 13
秒 36

やり方はいろいろあるとは思いますが、思い付いたのは次、

# Seriesの各データから各属性をリストで抽出しリスト化
[[i.year,i.month,i.day] for i in d_t]

[[2021, 1, 1],
 [2021, 1, 2],
 [2021, 1, 3],
 [2021, 1, 4],
 [2021, 1, 5],
 [2021, 1, 6],
 [2021, 1, 7],
 [2021, 1, 8],
 [2021, 1, 9],
 [2021, 1, 10]]

Seriesの各データをfor文で取り出し
各属性をリスト化
内包表記で全体をリスト化

上手いこと２次元のデータになった。あとはデータフレームに直すだけ。

# Seriesから年月日のリスト化
d = [[i.year,i.month,i.day] for i in d_t]
# データフレーム化
pd.DataFrame(d,columns=["年","月","日"])

	年	月	日
0	2021	1	1
1	2021	1	2
2	2021	1	3
3	2021	1	4
4	2021	1	5
5	2021	1	6
6	2021	1	7
7	2021	1	8
8	2021	1	9
9	2021	1	10

長くなりました。以上。

よちよちpython

独習 python/Qpython/Pydroid3/termux/Linux

【Pandas】日付の列を結合または分割する方法

目次

年・月・日で3列を1列の日付型にまとめる

pd.to_datetime()で日付に変換

時刻まで入っている場合

日付型1列を複数列に分割するには