python - How to add a repeated column using pandas -
i doing homework , encounter problem, have large matrix, first column y002 nominal variable, has 3 levels , encoded 1,2,3 respectively. other 2 columns v96 , v97 numeric.
now, wanna group mean corresponds variable y002. wrote code
group = data2.groupby(by=["y002"]).mean()
then index each group mean using
group1 = group["v96"]
group2 = group["v97"]
now wanna append group mean new column original dataframe, in each mean matches corresponding y002 code(1 or 2 or 3). tried code, shows nan.
data2["group1"] = pd.series(group1, index=data2.index)
hope me this, many :)
ps: hope makes sense. r language, can same thing using
data2$group1 = with(data2, tapply(v97,y002,mean))[data2$y002]
but how can implement in python , pandas???
you can use .transform()
import pandas pd import numpy np # data # ============================ np.random.seed(0) df = pd.dataframe({'y002': np.random.randint(1,4,100), 'v96': np.random.randn(100), 'v97': np.random.randn(100)}) print(df) v96 v97 y002 0 -0.6866 -0.1478 1 1 0.0149 1.6838 2 2 -0.3757 0.9718 1 3 -0.0382 1.6077 2 4 0.3680 -0.2571 2 5 -0.0447 1.8098 3 6 -0.3024 0.8923 1 7 -2.2244 -0.0966 3 8 0.7240 -0.3772 1 9 0.3590 -0.5053 1 .. ... ... ... 90 -0.6906 1.5567 2 91 -0.6815 -0.4189 3 92 -1.5122 -0.4097 1 93 2.1969 1.1164 2 94 1.0412 -0.2510 3 95 -0.0332 -0.4152 1 96 0.0656 -0.6391 3 97 0.2658 2.4978 1 98 1.1518 -3.0051 2 99 0.1380 -0.8740 3 # processing # =========================== df['v96_mean'] = df.groupby('y002')['v96'].transform(np.mean) df['v97_mean'] = df.groupby('y002')['v97'].transform(np.mean) df v96 v97 y002 v96_mean v97_mean 0 -0.6866 -0.1478 1 -0.1944 0.0837 1 0.0149 1.6838 2 0.0497 -0.0496 2 -0.3757 0.9718 1 -0.1944 0.0837 3 -0.0382 1.6077 2 0.0497 -0.0496 4 0.3680 -0.2571 2 0.0497 -0.0496 5 -0.0447 1.8098 3 0.0053 -0.0707 6 -0.3024 0.8923 1 -0.1944 0.0837 7 -2.2244 -0.0966 3 0.0053 -0.0707 8 0.7240 -0.3772 1 -0.1944 0.0837 9 0.3590 -0.5053 1 -0.1944 0.0837 .. ... ... ... ... ... 90 -0.6906 1.5567 2 0.0497 -0.0496 91 -0.6815 -0.4189 3 0.0053 -0.0707 92 -1.5122 -0.4097 1 -0.1944 0.0837 93 2.1969 1.1164 2 0.0497 -0.0496 94 1.0412 -0.2510 3 0.0053 -0.0707 95 -0.0332 -0.4152 1 -0.1944 0.0837 96 0.0656 -0.6391 3 0.0053 -0.0707 97 0.2658 2.4978 1 -0.1944 0.0837 98 1.1518 -3.0051 2 0.0497 -0.0496 99 0.1380 -0.8740 3 0.0053 -0.0707 [100 rows x 5 columns]
Comments
Post a Comment