# DATA MINING

Data Mining Steps Problem Definition  Market Analysis Customer Profiling, Identifying Customer Requirements, Cross Market Analysis, Target Marketing, Determining Customer purchasing pattern Corporate Dissection and Risk Management Finance Planning and Asset Evaluation, Resource Planning, Competition  Fraud Detection Customer Retention Production Control Science Exploration > Axioms Preparation  Data making-ready is encircling constructing a axiomsset from one or more axioms sources to be used for exploration and copying. It is a firm action to set-out delay an judicious axiomsset to get everyday delay the axioms, to invent primeval insights into the axioms and own a cheerful reason of any likely axioms condition issues. The Datasets you are granted in these projects were obtained from kaggle.com. Variable preference and description Numerical – Ratio, Interval Categorical – Ordinal, Nominal Simplifying capriciouss: From penny to discrete Formatting the axioms  Basic axioms conscientiousness checks: mislaying axioms, outliers > Axioms Exploration  Data Exploration is encircling describing the axioms by resources of statistical and visualization techniques. · Axioms Visualization:  o Univariate dissection explores capriciouss (attributes) one by one. Variables could be either plain or numerical.     Univariate   Analysis - Categorical   Statistics Visualization Description   Count Bar   Chart The reckon of appreciates of the   specified capricious.   Count% Pie   Chart The percentage of appreciates of the   specified capricious     Univariate   Analysis - Numerical   Statistics Visualization Equation Description   Count Histogram N The reckon of appreciates (observations)   of the capricious.   Minimum Box Plot Min  The lowest appreciate of the capricious.   Maximum Box Plot Max  The largest appreciate of the capricious.   Mean Box Plot The sum of the appreciates disconnected by the   count.    Median Box Plot The intermediate appreciate. Below and over   median lies an correspondent reckon of appreciates.   Mode Histogram The most numerous appreciate. There can be   more than one mode.   Quantile Box Plot A set of 'cut points' that dissect a   set of axioms into groups containing correspondent reckons of appreciates (Quartile,   Quintile, Percentile, ...).   Range Box Plot Max-Min The unlikeness betwixt completion and   minimum.   Variance Histogram A gauge of axioms arrangement.   Standard Deviation Histogram The clear commencement of discrepancy.   Coefficient of Deviation Histogram A gauge of axioms arrangement disconnected   by balance.   Skewness Histogram A gauge of harmony or aharmony in   the arrangement of axioms.   Kurtosis Histogram A gauge of whether the axioms are   peaked or lifeless not-absolute to a recognized arrangement. Note: There are two types of numerical capriciouss, cessation and homogeneity. An cessation capricious has appreciates whose unlikenesss are interpretable, but it does not own a penny nothing. A cheerful copy is air in Centigrade degrees. Axioms on an cessation flake can be pretended and subtracted but cannot be balanceingfully multifarious or disconnected. For copy, we cannot say that one day is twice as hot as another day. In contrariety, a homogeneity capricious has appreciates delay a penny nothing and can be pretended, subtracted, multifarious or disconnected (e.g., gravity). o Bivariate dissection is the concomitant dissection of two capriciouss (attributes). It explores the concept of connection betwixt two capriciouss, whether there exists an community and the force of this community. There are three types of bivariate dissection.  1.Numerical & Numerical ScMatter Plot, Linear Correlation … 2.Categorical & Categorical Stacked Column Chart, Combination Chart, Chi-clear Test 3.Numerical & Categorical Line Chart delay Error Bars, Combination Chart, Z-test and t-test > Modeling  · Premonitory copying is the system by which a copy is created to forebode an issue o If the issue is plain it is determined character and if the issue is numerical it is determined retreat.  · Descriptive copying or grouping is the assignment of observations into groups so that observations in the concordant group are concordant.  · Finally, community rules can ascertain thrilling communitys amongst observations.     Classification algorithms:    Frequency Table  ZeroR, OneR, Naive Bayesian, Decision Tree Codiscrepancy Matrix  Linear         Discriminant Analysis,         Logistic Regression Similarity Functions  K Nearest Neighbors Others  Artificial Neural Network, Support Vector Machine Regression    Frequency Table  Decision Tree Codiscrepancy Matrix  Multiple         Linear Regression Similarity Function  K Nearest Neighbors Others  Artificial Neural Network, Support Vector Machine   Clustering algorithms are:    Hierarchical  Agglomerative,         Divisive Partitive  K Means,         Self-Organizing Map > Evaluation  · helps to ascertain the best copy that represents our axioms and how polite the clarified copy conquer effort in the forthcoming. Hold-Out and Cross-Validation > Deployment The concept of deployment in forebodeive axioms mining refers to the collision of a copy for forebodeion to new axioms.    <