Steps in Machine Learning Programming 1. Import necessary libraries 2. Import the CSV File needed for data processing 3. Load the Data and separate them in X (Features) and Y (Row you want to predict) 4. Clean the Data by removing missing rows (Use Sklearn) 5. Encode Categorical Data (Text column into numbers because Machine-Learning Understands only numbers) in this case use One-Hot Encoding from SKLearn Library. - This will encode text values into Vector e.g. First text value will be encoded as 1000 a second text value would be 0100, the third text value would be encoded as 0010, etc. 6. Now that you have encoded the features into One-Hot Encode Vector Values, you need to encode the Labels which are normally "yes" or "No" values by using Label Encoders. Use SKLearn Preprocessing Class to import LabelEncoders namespace/class in order to encode Labels into Numbers (1 and 0 )
7. Now that step 6 is complete, you need to Split your data in half for training and Testing.
Question: Do we have to apply Feature Scaling before or After Data Split in Machine-Learning? Answer: We need to apply Feature Scaling after Data Splitting because the test set that you will be conducting tests on has to be a brand new data set that the model hasn't seen before in order to have an accurate evaluation of the Machine-Learning Model.
8. Now that Step 7 is complete, you need to Feature Scale your Dataset because you wouldn't want some of the features to be dominated by other features. - Choose Standardization or Normalization Feature Scaling. -Machine Learning is based on a Euclidean Distance of P1 and P2 computed Value (Which means two coordinates between two points) - Keep in mind that you don't have to apply Feature Scaling to dummy values (those values in 1's and 0's if you do, the values will lose its actual meaning) in your Features One-Hot Encoded, the value of having Standardized Scaling is to have the feature within the same range. - You only apply Feature Scaling to those Feature Values that are not in 1's and 0's (45.8808) so that they can become within the Data Range which is 1's and 0's (values will be between -2 and 2) for those values not in 1's and 0's. - Standardization Scaling will only compute the Mean and the Standard Deviation of the columns you want to Feature Scale. - Fit function will compute the Formular and Transform will actually apply the results gotten from the Fit function to the Dataset. - Don't forget to Scale the Data on the Test Dataset using the same Scaller because the dataset has to be preprocessed by the same algorithm.