국씨의 메모장: [Machine Learning] 2. Pre-processing (Preparation)

2017년 4월 20일 목요일

Big Data Analytics에 있어서, 상당 부분 시간이 소요되는 부분은

사실 분석보다는 분석을 위한 준비 단계(데이터 수집+데이터 정제)이다.

참조 :

최근 관련 시장을 보면, Paxata, Datameer, Alteryx와 같은 툴 뿐 아니라, IT 기업에서 다양한 가치와 경험을 내걸고 출시하고 있는 솔루션들이 많다.

ex. ETL(AWS Glue), Data Preparation(Google DataPrep)

Data Preparation이라는 무슨 작업일까?

data 정제 및 분석에 알맞은 형태로 변환이라는 넓은 범위에서 보면 상당히 다양하다.

- missing value 제거

- 데이터 추가 (ex. Timestamp field)

- type 확인하기 (ex. number, string, categories, etc)

- replace values

- split values with delimiter

- data scaling : 분석 모델에 따라 필요한 경우가 많다.

(ex. log(x), sqrt(x), 1/x, x^2, exp(x), 표준화(X-μ/σ)

- dummy variables(indicator variable)

. 범주형 data에 대해 필드를 추가하여 0 or 1로 나타냄.

Machine Learning Series에서는 이 정도로만 다루고 넘어가도록 하자.

Home Improvement2021년 7월 1일 오후 5:14
Great post I must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more. 4 ps of marketing
답글삭제
답글