Techniques For Data Analysis

Tradition Data Techniques

When working with Traditional Data the first thing to note is you can not analyze it right away. Since this data is collected mainly in surveys or cookies so it can sometimes be incomplete or with errors. That is why first you have to pre-process the data so you could either fix errors in the data like spelling errors, or you can invalidate certain responses if they are just false in your raw data. This method is called Data Cleansing, Data Cleaning or Data Scrubbing

Data can be either numerical which means it can be manipulated by math, or it can be categorical which means it can not be manipulated by math.

Case-Specific Techniques:

One method commonly used with traditional data is balancing that is used in cases where your data is not balanced like you wanted. For example, if you wanted to view the different habits of shoppers of different ages but got more data for younger people than for older people that would make one group stand out in your analysis but with balancing you could just grab the same amount of people for each group so you can better review your data.

Another method used is Schuffuling of data sets this used to pick your data in random order so you can prevent unwanted patterns. It also helps with prediction performance and avoiding misleading results.

Big Data Techniques

Big data like Traditional data has to be checked before you can really start with the analysis. With big data, it is a little more complex since it can be video files, images, or audio files but it can be always checked with data cleansing. You will also need to deal with missing values since it is more complex and harder to fix everything.

Case-Specific Cases:

Text data mining is used to extract valuable information from a large sample of text. For example, finding relevant data for your project from a large sample of academic papers by looking for topics of interest.

Data Masking is the method used to extract and analyze data without compromising private information. This is achieved by concealing the original data with random and false data which in term allows you to analyze it without compromising the private side.