LabelEncoder가 동작하지 않을때 pd.factorize 쓰기

분명 dtype이 object임을 확인하고,

value가 string, number, null값 등등이 있는 것도 확인했는데

LabelEncoding만 하면

TypeError: argument must be a string or number

이 뜬다면 다음과 같은 방법을 시도해 보는것이 적절하다.

for col in categorical_feats:

data[col], indexer = pd.factorize(data[col])

test[col] = indexer.get_indexer(test[col])

factorize와 indexer를 사용하면 LabelEncoder와 비슷한 기능을 하게해준다.

하지만 이때 indexer가 np.array이면 get_indexer가 동작하지 않음을 주의하자!!!

[시계열] Stationarity 판단하는 템플릿 (0)	2019.10.10
시계열 분석 ARIMA에 대해서 (0)	2019.10.08
ML에서 OOF (out of the folds)란? (0)	2019.10.06
rf,xgb,cat,lgb stacking 예제 (0)	2019.10.06
[Plot] 전체적인 categorical feature의 histogram그려주는 countplot 템플릿 (0)	2019.10.05

티스토리툴바