머신러닝 분석에 필요한 단계들 (수정중)

특성 둘러보기 (데이터타입)

특성 이해하기 (이해할 수 없는것은 넘어가기)

히스토그램으로 분포 살펴보기

중복 제거하기

분산체크하기 ( 타겟별 분포, 각 feature별 분포 ). 필요하다면 스케일링

outlier값 탐지해내서 제거하기. (min, max) 체크

특성가공하기 (알기쉬운 이름으로 바꾼다던지, 연산하기 쉬운 걸로 바꾼다던지(date_time) )

데이터타입 숫자형, 문자형 잘 결정하기 (1인데 '1'일수도있음)

LabelEncoding, One-hot Encoding, Mean Encoding등을 사용해서 인코딩하기

모델 결정하기

앙상블할것인지 결정하기

GridSeach도 고려하기.

제출

===================

### Outline :

1. Understanding our data.

Gather Sense of our data

2. Preprocessing.

a) Scaling and Distributing

b) Splitting the Data

3. Random UnderSampling and Oversampling;

a) Distributing and Correlating

b) Anomaly Detection

c) Dimensionality Reduction and Clustering

d) Classifiers

e) A Deeper Look into Logisitc Regression

f) Oversampling with SMOTE

4. Testing

a) Testing with Logistic Regression

b) Neural Network Testing (Undersampling vs Oversampling)

===============================

Scaler 의 종류 (0)	2019.08.21
Imbalance 한 dataset에서의 실수 및 방법 (0)	2019.08.20
머신러닝 template (0)	2019.08.16
머신러닝 지침! (0)	2019.08.14
머신러닝 분석 단계 (0)	2019.08.13

티스토리툴바