Data Mining Steps
Problem Definition
Market Analysis
Customer Profiling, Identifying Customer Requirements, Cross Market Analysis, Target Marketing, Determining Customer purchasing pattern
Corporate Dissection and Risk Management
Finance Planning and Asset Evaluation, Resource Planning, Competition
Fraud Detection
Customer Retention
Production Control
Science Exploration
> Axioms Preparation
Data making-ready is encircling constructing a axiomsset from one or more axioms sources to be used for exploration and copying. It is a firm action to set-out delay an judicious axiomsset to get everyday delay the axioms, to invent primeval insights into the axioms and own a cheerful reason of any likely axioms condition issues. The Datasets you are granted in these projects were obtained from kaggle.com.
Variable preference and description
Numerical – Ratio, Interval
Categorical – Ordinal, Nominal
Simplifying capriciouss: From penny to discrete
Formatting the axioms
Basic axioms conscientiousness checks: mislaying axioms, outliers
> Axioms Exploration
Data Exploration is encircling describing the axioms by resources of statistical and visualization techniques.
· Axioms Visualization:
o Univariate dissection explores capriciouss (attributes) one by one. Variables could be either plain or numerical.
Univariate Analysis - Categorical
Statistics
Visualization
Description
Count
Bar Chart
The reckon of appreciates of the specified capricious.
Count%
Pie Chart
The percentage of appreciates of the specified capricious
Univariate Analysis - Numerical
Statistics
Visualization
Equation
Description
Count
Histogram
N
The reckon of appreciates (observations) of the capricious.
Minimum
Box Plot
Min
The lowest appreciate of the capricious.
Maximum
Box Plot
Max
The largest appreciate of the capricious.
Mean
Box Plot
The sum of the appreciates disconnected by the count.
Median
Box Plot
The intermediate appreciate. Below and over median lies an correspondent reckon of appreciates.
Mode
Histogram
The most numerous appreciate. There can be more than one mode.
Quantile
Box Plot
A set of 'cut points' that dissect a set of axioms into groups containing correspondent reckons of appreciates (Quartile, Quintile, Percentile, ...).
Range
Box Plot
Max-Min
The unlikeness betwixt completion and minimum.
Variance
Histogram
A gauge of axioms arrangement.
Standard Deviation
Histogram
The clear commencement of discrepancy.
Coefficient of Deviation
Histogram
A gauge of axioms arrangement disconnected by balance.
Skewness
Histogram
A gauge of harmony or aharmony in the arrangement of axioms.
Kurtosis
Histogram
A gauge of whether the axioms are peaked or lifeless not-absolute to a recognized arrangement.
Note: There are two types of numerical capriciouss, cessation and homogeneity. An cessation capricious has appreciates whose unlikenesss are interpretable, but it does not own a penny nothing. A cheerful copy is air in Centigrade degrees. Axioms on an cessation flake can be pretended and subtracted but cannot be balanceingfully multifarious or disconnected. For copy, we cannot say that one day is twice as hot as another day. In contrariety, a homogeneity capricious has appreciates delay a penny nothing and can be pretended, subtracted, multifarious or disconnected (e.g., gravity).
o Bivariate dissection is the concomitant dissection of two capriciouss (attributes). It explores the concept of connection betwixt two capriciouss, whether there exists an community and the force of this community.
There are three types of bivariate dissection.
1.Numerical & Numerical
ScMatter Plot, Linear Correlation …
2.Categorical & Categorical
Stacked Column Chart, Combination Chart, Chi-clear Test
3.Numerical & Categorical
Line Chart delay Error Bars, Combination Chart, Z-test and t-test
> Modeling
· Premonitory copying is the system by which a copy is created to forebode an issue
o If the issue is plain it is determined character and if the issue is numerical it is determined retreat.
· Descriptive copying or grouping is the assignment of observations into groups so that observations in the concordant group are concordant.
· Finally, community rules can ascertain thrilling communitys amongst observations.
Classification algorithms:
Frequency Table
ZeroR, OneR, Naive Bayesian, Decision Tree
Codiscrepancy Matrix
Linear Discriminant Analysis, Logistic Regression
Similarity Functions
K Nearest Neighbors
Others
Artificial Neural Network, Support Vector Machine
Regression
Frequency Table
Decision Tree
Codiscrepancy Matrix
Multiple Linear Regression
Similarity Function
K Nearest Neighbors
Others
Artificial Neural Network, Support Vector Machine
Clustering algorithms are:
Hierarchical
Agglomerative, Divisive
Partitive
K Means, Self-Organizing Map
> Evaluation
· helps to ascertain the best copy that represents our axioms and how polite the clarified copy conquer effort in the forthcoming. Hold-Out and Cross-Validation
> Deployment
The concept of deployment in forebodeive axioms mining refers to the collision of a copy for forebodeion to new axioms.
<