Looking for a Tutor Near You?

Post Learning Requirement »
x
x

Direction

x

Ask a Question

x

x
Hire a Tutor

BUSINESS ANALYTICSS

Loading...

BUSINESS ANALYTICS IS THE PROCESS OF USING TOOLS AND TECHNIQUES TO FIND SOLUTIONS TO BUSINESS PROBLEMS

Supriya / Abu Dhabi

3 years of teaching experience

Qualification: M-Tech and MBA

Teaches: Advanced Maths, Basic Computer, Computer Science, Education, Electronics, Physics, Statistics, Strategic Financial Management, Accounting, Engineering, Marketing Communication, Maths, Accounts, Business Studies, Economics, Management, Business Finance, English, Computer, Accountancy, Mathematics, Project Management

Contact this Tutor
  1. BUSINESS ANALYTICS
  2. No 2 Course Plan TlQics Business Analytics Business Analytics, evolution and scope, business analytics process; decision models. Spreadsheets, spreadsheet functions, spreadsheet modeling overview. Datilsets, data cleamng, missing data handling, outliers. Data Visualization - charts and graphs, data queries — sorting and filtering, summarinng data - frequency, relative frequency, histograms, percentages and quartiles, cross tabulation, pivots. Financial Analytics and Operations Analytics Understanding Risk, credit nsk analysis. fraud detection and prevention analyucs, analytics 111 banking and financial services • analytics in retail banking and wealth management Demand Planning - Forecasting Model building, Supply planning • Procurement and Strategic Sourcing, Inventory Modeling • Aggregate lanni and resource allocation decisions, decision First Internal Examination % of marks in Trimester exams 15 20 2
  3. 3 Human Resource Analytics Recruitment Analytics, On Boarding Analytics, Staffing Analytics, Performance & Skill Gap Analytics, Compensation & Benefit Analytics, Training & Learning Analytics, Promotion and Succession Planning Analytics, Compliance Analytics, Attition& Retention Analytics Marketing Analytics 4 Basics of marketing analytics, marketing decisions models, charactenstics and benefits of marketing decisions models, Sepnentation using factors analysis and cluster analysis, regrsion and choice based segmentation, positiomng • perceptual maps: developing perceptual map, multi dlmenslonal scaling Second Internal Examination 15 20
  4. Web Analytics Click stream analytics, engagement quantification frameworks, anonymous vs. registered users analysis, 5 Social Media Analytics • User generated content • Sentiment Analysis. Analytics in digital decoding consumer intent, decoding customer sentiments from comments, Text mining from opinion platforms Predictive Analytics and Application of Analytics in Business Logic driven predictnv models - single-period purchase decisions, multiple time period models, overbooking decisions, Data Driven predictive models retail pricing markdowns, modeling relationships and trends 6 Models involving uncertainty • what-if analysis, scenario manager, goal• Applications in other business areas • Analytics in telecom and location based intelligence marketing, analytics in consumer packaged goods (CPG), analytics in utilities, analytics in healthcare, analytics in online rul Final Examination 10 20
  5. 2. 3. 4. 5. 6. 7, 8. 9. 10. ll. References Evans, J.R. (2013), Business Analytics: Methods, Models, and Decisions, 3rd ed. Pearson India Malhotra, N .K and Dash, S. (2011 ), Marketing Research, ed. Pearson India Baesens, Vlasselaer, V.V and Verbeke, W. (2015), Fraud Analytics Usmg Descriptive, Predictive and Social Network Techniques, ed. Wiley India Prasad, R.N. and Acharya, S. (2011 Fundaments of Business Anah'tics, I g ed. Wiley India Davenport, TM. and Hams, J.G. (2007) Competing on Analytics: The New Science of "inning, ed. Harvard Business Review Press Damodharan, A (2007), Strategic Risk Takin , IA ed., Press Davenport, TH., Harris, J.CJ., and Morison, (2010), Analytics at Work: Smarter Decisions, Better Results, eW Raman, A & Fisher, (2010), How Analytics Are Transforming the Supply Chain and Improving Performance, ed. HBS Press Fitz-enz, J. (2010), lie New HR Analytics: Predicting the Economic ralue of Company's Human Capital Investments. I st ed., American Management Association Peterson, E. (2004), Web Analytics Demystified: A Marketer's Guide to Understanding How Your Web Sitc Affects Your Business, ed. Celilo Group Media & Café Press Lilien, GL and Rangaswamy, L (2004), Marketing Engineering: ComputerAssisted Marketing Analysis and Planning, 2nd d Revised, Trafford Publishing. 5
  6. SOIUIVNV ssarusna L elnpon
  7. What is Business Analytics (BA)? Business Analytics (BA) or simply analytics is the use of data, information technology, statistical analysis, quantitative methods, and mathematical or computer-based models to help managers gain improved insight about their business operations and make better, fact TOOLS & Data TECHNIQUES BUSINESS INSIGHT
  8. Examples of BA Applications Pricinq: Setting prices for consumer and industrial goods, government contracts, and maintenance contracts Customer seqmentation: Identifying and targeting key customer groups in retail, insurance, and credit card industries Merchandisinq: Determining brands to buy, quantities, and allocations Location: Finding the best location for bank branches and ATMs, or where to service industrial equipment Social Media: Understand trends and customer perceptions; assist marketing managers and product designers
  9. Impacts and Challenges of BA Benefits: High Productivity High Profitability High Customer Satisfaction High Risk Management High Reputation Management Challenqes: Data often needs lots of cleaning Data is much more unstructured Data volumes are growing — Fast! Good analytics doesn't solve bad business process
  10. 1. 2. 3. 4. 5. Evolution of Business Analytics Business Intelligence (B') — collection, management, analysis and reporting of data. Statistical Methods helps to find relationship between data (e.g. regression, forecasting, data mining, interference) Operations Research (OR) — uses mathematical and computer based models Operations Research Management Science (OR/MS) — uses both optimization and modeling. Decision Support System (DSS) — Bl + OR/MS 10
  11. A Visual Perspective of BA Statistics Simulation Data Mining Visualization Business Intelligence/ Information Systems What If? and Risk Modeling and Optimization
  12. Scope of Business Analytics Business reports VVhat•s hivpening, why is it what wil HIM it be Business database or data 1 . Descriptive analytic analysis 2. Predictive analytic analysis 3. Prescriptive analytic analysis Outcome Of the entire BA analysis: Business computer do ud data storage Find osportunities. whch the firm can take ad.•antage. Albcate resajrces to take the coportuniti.æ. increase in business value 12
  13. Types of Business Analytics Business Analytics Descriptive What happened? What 'S • DaÜ&oarcE • Data We" dented busøwss and g:portwvoes Predictive INhat Will happen? Why wat it happen? • Data mining • Text man ing • Wewrnedia mining • Forecasting prescnptive What snouLd i do? Why ShoubO do it? • SirmAatjon Decision • Expert systems Accurate æst possö/e of the future states business a.'M conaroons transacoons 13
  14. Tools Supporting BA Database queries and analysis Dashboards to report key performance measures Data visualization Statistical methods Spreadsheets and predictive models Scenario and "what-if" analyses Simulation Forecasting Data and text mining Optimization Social media, web, and text analytics 14
  15. Example: Retail Markdown Decisions Most department stores clear seasonal inventory by reducing prices. Key question: When to reduce the price and by how much to maximize revenue? Potential application of analytics: Descriptive analytics: examine historical data for similar products (prices, units sold, advertising Predictive analytics: predict sales based on price Prescriptive analytics: find the best sets of pricing and advertising to maximize sales revenue 15
  16. Data for Business Analytics • Data: numerical or textual facts and figures that are collected through some type of measurement process. • Information: result of analyzing data; that is, extracting meaning from data to support evaluation and decision making. • Data set: a collection of data. Examples: Marketing survey responses, a table of historical stock prices • Database: a collection of related files containing records on people, places, or things. Biq Data: Big data to refer to massive amounts business data from a wide variety of sources, much which is available in real time, and much of which uncertain or unpredictable. of of is 16
  17. Examples of Data Sources Annual reports Accounting audits Financial profitability analysis Economic trends Marketing research Operations management performance Human resource measurements Web behavior 17
  18. Example: A Sales Transaction Database File 2 3 4 5 6 7 8 9 10 11 12 13 Sales Transacøons: My 14 cust ID Paypal 10002 west creat 10003 North Credi West 10005 South Credit 10006 west Pay" 10007 East _ßredt locu west Credi 10009 west Pay" 10010 South Source web 93816545 740834X) 64942368 70560957 3S208817 20978903 80103311 14132683 40128225 49073721 Amount $20_19 $17.85 S23.98 $23.51 SIS.33 $17.30 $177.72 S21.76 $15.92 Product ovo OVO DVD Book DVD Book Book DVD DVD Time Of 22:19 1327 14:27 15:38 IS:21 13: 11 21:59 4:04 19:35 13:28 Records Entities Fields or Attributes A database file is usually organized in a two- dimensional table, where the columns correspond to each individual element of data (called fields, or attributes), and the rows represent records of related data elements. 18
  19. Measurement Scales C"iNTERVALÄ ORDINAL Named Named Variables Ordered Proportionate Interval between Variables RATIO Proportionate Interval between Variables Can accommodate Absolute Zero 19
  20. Measurement Scales - Example 'ten 9 11 12 $ 05260 A41tN1 1122 4312 7258 30 30 30 {6 uwtt WW1' *'31/11 W2Y1t 40 W2S'11 W2Y1t 460 20
  21. Data Validity and Reliability Validity - data correctly measures what it is supposed to measure. Reliability - data are accurate and consistent. Reliable Not Valid Valid Not Reliable Neither Reliable Nor Valid Both Reliable And Valid 21
  22. Models in Business Analytics Model - an abstraction or representation of a real system, idea, or object. Captures the most important features Can be a written or verbal description, a visual representation, a mathematical formula, or a spreadsheet.
  23. 1. 2. Three Forms of a Model Verbal descri tion: The rate of sales starts small as ear y a opters egln to evaluate a new product and then begins to grow at an increasing rate over time as positive customer feedback spreads. Eventually, the market begins to become saturated and the rate of sales begins to decrease. Visual model: A sketch of sales as an S-shaped curv al model: —S = aebect where S is sales, t is time, e is the base of natural logarithms, and a, b and c are constants.
  24. Decision Models A logical or mathematical representation of a problem or business situation that can be used to understand, analyze, or facilitate making a decision Inputs: Data which are assumed to be constant for purposes of the model Uncontrollable variables which are quantities that can change but cannot be directly controlled by the decision maker Inputs Data, Uncontrollable Variables, and Decision Variables Decision Outputs Measures of Performance or Behavior 24
  25. Break-Even Decision Model Saks ($,j POir• 3 total Reve-ue total Cost 6 Break-even point can be described as a point where there is no net profit or loss Break-even point is the number of units (N) produced which make zero profit. Revenue — Total costs = 0 Total costs = Variable costs * N + Fixed costs Revenue = Price per unit * N Price per unit * N — (Variable costs * N + Fixed costs) So, break-even point (N) is equal N = Fixed costs / (Price per unit - Variable costs) 25
  26. Example: A Break-Even Decision Model TC(manufacturing) = $50,000 + $125*Q TC(outsourcing) = $175*Q Breakeven Point: TC(manufacturing) = TC(outsourcing) $50,000 + $125 x $50,000 = 50 Q 1,000 General Formula OgtswrciN ssaæo 26
  27. Demand Prediction Models A Linear Demand Prediction Model D-20,OOO-10P $90 51m As price increases, demand falls. A Nonlinear Demand Prediction Model $19,020 SL920 514900 90 S" 90 sno 5120 ssumes price elasticity is constant 27
  28. 1. 2. 3. 4. 5. 6. Problem Solving with BA Recognizing a problem Defining the problem Structuring the problem Analyzing the problem Interpreting results and making a decision Implementing the solution 28
  29. Problem Solving with BA (Cont..) 1. Recoqnizinq a problem: Problems exist when there is a gap between what is happening and what we think should be happening. E.g. costs are too high with competitors 2. Defininq the problem: Clearly defining the problem is not a trivial task Complexity increases when the following occurs: Large number of courses of action The problem belongs to a group and not to an individual The problem solver has many competing objectives External groups or individuals are affected by the problem Problem owner and problem solver are not the same person Time limitations are important 29
  30. Problem Solving with BA (Cont..) 3. Structurinq the problem: Stating goals and objectives Characterizing the possible decisions • Identifying any constraints or decisions 4. Analyzing the problem: Analytics plays a major role • Involves some sort of experimentation or solution process • Evaluates different scenarios Analyzing risk associated with various decision alternatives Finding a solution that meets certain goals, or
  31. Problem Solving with BA (Cont..) 5. Interpretinq results and makinq a decision: • Models cannot capture every detail of the real problem • Managers may understand the limitations of models and their underlying assumptions and often incorporate judgment into making a decision. 6. Implementina the solution: Translate the results of the model back to the real world • Requires providing adequate resources, motivating employees, eliminating resistance to change, modifying organizational policies, and developing trust. 31
  32. Module 1 - Part 2 SPREADSHEET
  33. SPREADSHEETS Many commercial software packages can be used for Business Analytics. Spreadsheet software, such as Microsoft Excel, is widely available and used across all areas of business. • Spreadsheets provide a flexible modeling environment for manipulating data and developing and solving models.
  34. 14 ? 11 ? 11 SPREADSHEET , EXAMPLE in , 03 SimoneCM"W"'1.9.e«n Mary • ~ , , , 0 , ! t2-cÜn D•ect 3 6-c•m
  35. Spreadsheets Lotus k 2.1 Spreadsheet OS/2 Ability Office Excel PlanMaker 35
  36. PARTS OF THE SPREADSHEET WINDOW Active cell 2 3 4 5 6 Row 5 Column C cells 36
  37. RIBBON e Insert Calibri View General Paste Clipboar Page Layout 11 • A Font Formulas Data Review Alignment Al Tabs - The seven tabs are Home, Insert, Page Layout, Formulas, Da Homt Fotmui.M Oita Groups - These show related items together. For example, there is the Font group (within the HOI Commands A command is a button, a box to enter information, or a menu. For example, there is the B button (in the Font group in the Home tab) to B (e text bold. 37
  38. EXCEL FORMULAS • Formulas are used to make a spreadsheet more versatile. When writing formulas in a cell an equal sign = is first Mathematical-Qpga!ions Multiplication Division Addition Subtraction
  39. COPYING FORMULAS Formulas in cells can be copied in many ways. Use the Copy button in the Home tab, Paste button Use Ctrl-C, then Ctrl-V Home Insert cut copy Paste Format Painter Clipboard Al 1 2 Page Layout Calibri Form Lila s then use the Data 39
  40. OTHER USEFUL EXCEL TIPS Split Screen — horizontally or vertically • Paste Special - Paste Values will paste the results of the formulas • Column and Row Widths Displaying Formulas in Worksheets - Show Formulas in Formula Auditing group Displaying Grid Lines and Column Headers for Printing Filling a Range with a Series of Numbers
  41. EXCEL FUNCTIONS —S UNfCrange) Integer Sum Function 2 Decimal Sum Function 3 Sum Function 4 Error Sum Function —S UNf@ange) 2 5 9 *NAME? 7 10 c 41
  42. EXCEL FUNCTIONS The MIN function can be used to return the smallest value from a set of data. c 1 2 3 S 6 7 8 9 First Sarah Justin Manfred Aubrey Gen Renee Duncan Gatt HOI lis Johnson Sinclair Tanaka Zwick 69 88 74 42
  43. EXCEL FUNCTIONS The MAX function can be used to return the largest value from a set of data. 3 to First Justin Manfred Troy Aubrey Renee Gatt Hollis Sinclair T a naka Z wick 74
  44. • EXCEL FUNCTIONS To get the average of a set of numbers, use the AVERAGE function. = AVERAGE(range) average numbers File Home Insert Page Layout Formulas c Average numbers Calculate the average of a set of numbers - Microsoft Excel Review View 1 2 3 4 5 6 7 8 9 Test 1 Test 2 Test 3 Avera e Notes 60 83 blanks are ignored zero is included text is ignored 44
  45. EXCEL FUNCTIONS The Excel COUNT function retums the count of values that are numbers, generally cells that contain numbers • -COUNT [value2],... Home Insert Page Layout Formulas Oata COUNT (valuei, value2 Review 10 Value puppy apple 100 20% 0.5 Numbers
  46. • • IF-TYPE FUNCTIONS COUNTIF is a function to count cells that meet a single criteria —COUNTIF (range, criteria) range - The range of cells to count, criteria - The criteria that controls which cells should be counted. ">tøø") // Count Sales over 100 // count name = "jim" // count state = -co" "*100") 2 3 6 8 9 12 COUNTW MN ' Sales $100.00 $200.00 SSO.oo s •s.oo SISO.OO $200.00 Sales Jim 46
  47. IF-TYPE FUNCTIONS The Excel SUMIF function returns the sum of cells that supplied criteria. =SUMIF (range, criteria, = // values S løø = // Rep = Jim = // state = CA SUMIF function criteria. sum _ range) Sara h 10 Sales S 125 srso Sales Rep Jim State CA S47S Sloo 47
  48. LOGICAL FUNCTIONS B AND function D B OR function c D Return TRUE if all conditions are TRUE Score Return TRUE if any condition is TRUE 4 5 6 7 8 9 10 11 76 81 78 90 85 100 TRUE TRUE TRUE FALSE TRUE FALSE 4 7 10 Color Red Blue Green Red Blue Green Green OR Red TRUE FALSE TRUE TRUE FALSE TRUE 48
  49. LOOKUP FUNCTION The Excel LOOKUP function performs an approximate match lookup in a one-column or one-row range, and returns the corresponding value from another one-column or one-row range. • —LOOKUP (lookup_value, lookup_vector, [result_vector]) lookup_value - The value to search for. lookup_vector - The one-row, or one-column range to search. result_vector - [optional] The one-row, or one-column range of Insert page Layout Formulas Data Review View c 1 2 3 5 6 7 8 9 LOOKUP function Points Level 1000 Bronze 2000 Silver 3000 Gold 4000 Platinum 5000 Diamond Points Match Level 2350 2 Silver 49
  50. INDEX FUNCTION The Excel INDEX function returns the value at a given position in a range or array. • =INDEX (array, row_num, [col_num], [area_num]) array - A range of cells, or an array constant. row_num - The row position in the reference or array. col num array. G7 - [optional] The column position in the reference or lld be I N DEX(a r ray, Planet M e rcury 2 Venus 3 Earth 4 Ma rs 5 Jupiter Saturn 7 Ura n us 8 Neptune 2 3 4 Position Diameter Sate lites 2 7 5 km 12 km 13 km 7 km 143 km 121 km 51 km 0 27 Jupiter's diavvveter is at row S, colur."n 3 143 km Jupiter's diam eter 50
  51. MATCH FUNCTION MATCH is an Excel function used to locate the position of a lookup value in a row, column, or table • =MATCH (lookup_value, lookup_array, [match_type]) lookup_value - The value to match in lookup_array. lookup_array - A range of cells or an array reference. match_type - [optional] How to match, specified as -1, 0 or fruits - cs:C11 1. Default i 1 2 3 4 5 6 7 8 9 10 11 MATCH (lookup_value, lookup_array, match_type) 1 2 3 5 6 Fru it Apple Pear Peach Grape Lemon Lime Kiwi Value Pear Grape Kiwi Result 2 4 7 51
  52. • 1. 2. 3. Cell References in Excel A worksheet in Excel is made up of cells. These cells can be referenced by specifying the row value and the column value. For example, Al would refer to the first row (specified as 1) and first column (specified as A). Similarly, B3 would be third row and second column. The power of Excel lies in the fact that you can use these cell reference in other cells when creating formulas. Now there are three kinds of cell references that you can use in Excel: Relative Cell References Absolute Cell References Mixed Cell References
  53. Relative Cell References To calculate the total for each item, we need to multiply the price of each item with the quantity of that item. 2 3 4 5 6 7 8 Item Item A Item B Item C Item D Item E Item F Item G Price 15 20 12 18 8 10 20 c Quantity 15 20 18 8 10 20 10 Total 53
  54. Relative Cell References For the first item, the formula in cell D2 would be B2* C2 (as shown below) 2 3 4 5 6 7 8 Now, instead of Item Item A ytem B Item C Item D Item E Item F Item G Price 15 20 12 18 8 10 20 c Quantity 15 20 18 8 10 20 10 Total 225 entering the formula for all the cells one by one, you can simply copy cell D2 and paste it into all the other cells (D3:D8). When you do it, you will notice that the cell reference automatically adjust to refer to the corresponding row. For example, the formula in cell D3 becomes B3*C3 and the formula in D4 becomes B4*C4. 54
  55. Relative Cell References price Qua tv 4 5 6 7 8 I tern I tern I ten. 4 00 216 These cell references that adjust itself when the cell is copied are called relative cell references in Excel. SUM 1 2 3 4 5 6 7 8 I tern 'tern A 'tern 'tern I tern I tern 'tern 'tern Price 15 20 12 18 8 10 20 Qua ntity 15 20 18 8 10 20 10 Total 225 400 144 200 200 55
  56. Absolute Cell References Unlike relative cell references, absolute cell references don't change when you copy the formula to other cells. For example, suppose you have the data set as shown below where you have to calculate the commission for each item's total sales. The commission is 20% and is listed in cell Gl . G 2 3 4 5 6 7 8 Item Item A Item B Item C Item D Item E Item F Item G Price 15 20 12 18 8 10 20 c Quantity 15 20 18 8 10 20 10 D E F Total Commission 225 216 144 80 200 200 20% 56
  57. Absolute Cell References To get the commission amount for each item sale, the following formula in cell E2 and copy for all cells: use Quantity 15 20 18 10 20 10 Tota I Commission 2 3 4 5 6 7 8 Item Item A Item B Item C Item D Item E Item F Item G price 20 12 18 10 20 20% 225 400 216 144 80 200 200 45 80 43.2 28.8 16 40 40 Note that there are two dollar signs ($) in the cell reference that has the commission — $G$2. 57
  58. What does the Dollar ($) sign do A dollar symbol, when added in front of the row and column number, makes it absolute (i.e., stops the row and column number from changing when copied to other cells). • For example, in the above case, when I copy the formula from cell E2 to E3, it changes from =D2*$G$1 to • Note that while D2 changes to D3, $G$I doesn't change. Since we have added a dollar symbol in front of 'G' and '1' in Gl, it wouldn't let the cell reference change when it's copied. • Hence this makes the cell reference absolute. 58
  59. SPREADSHEET MODELING Spreadsheet Inputs modeling basics ormul Outputs Elements of a spreadsheet/mathematical models Inputs -Known. estimated. uncertain Decision variables - Values must be determined using the model Outputs - Values of interest, Computed from the inputs and the decision variables. Spreadsheet Modeling Uses the models like: Optimization Sensitivity analysis Prepare reports 59
  60. CRITERIA'S FOR SPREADSHEET MODELING Clear and logical layout Separation ofdifferentparts ofa model Clear headings for all inputs, decision variables, and outputs Range names Formatting features— bold, italics, coloring, etc. Cell comments Text boxes for assumptions and explanations The formulas and logic must be correct Breakdown computations to small chunks and build Model must be flexible for modification • Plan ahead, revise your plan as needed 60
  61. Advantages Of Spreadsheet 'What-if' questions can be asked without rebuilding a model from scratch each time a test is run. Automatic re-calculation Graphs can be produced Variables and constants can be used The model can be saved and backed up The model can be shared between different people in different locations. No additional software is required It might be quicker and cheaper to build a computer model than a physical model. Only one model needs to be build which can then be changed It's a lot safer to run a simulation under extreme conditions than to build the actual model and test it. Computer-based models can be speeded up or slowed down to see effects that are difficult to see n real life.
  62. Disadvantages Of Spreadsheet The model may not be an accurate representation of the real world. The real world is complex. • If the model relates to people then an accurate result may not be given. For example, if a model is constructed to show the time taken to evacuate a building it might not take into account the fact that people panic. • Many variables may need to be considered and it is easy to miss things out! This may lead to misleading results. Producing an effective model may be time consuming and running the model may need expensive hardware and software.
  63. Module 1 - Part 3 +FTS 63
  64. Datasets A collection of related sets of information that is composed of separate elements but can be manipulated as a unit by a computer. A dataset can be accessed individually or in combination or managed as a whole entity. A dataset is organized into some type of data structure.
  65. Data Cleaning • Data cleaning is the process of identifying and removing inaccurate records from a dataset, table, or database and refers to recognizing unfinished, unreliable, inaccurate or non-relevant parts of the data and then restoring, remodeling, or removing the dirty or crude data. • Data cleaning techniques may be performed as batch processing. After cleaning, a dataset should be uniform with other related datasets in the operation. With the rise of big data, data cleaning methods has become more important than ever before. 65
  66. 1. 2. 3. 4. 5. 6. 7. 8. Data Cleaning Techniques Get Rid of Extra Spaces Select and Treat All Blank Cells Convert Numbers Stored as Text into Numbers Remove Duplicates Highlight Errors Change Text to Lower/Upper/Proper Case Spell Check Delete all Formatting 66
  67. Missing Data • Dealing with missing data, has been always a challenge in data analysis context. We need methods in missing data analysis that: Minimize the bias Maximize use of available information, and Get good estimates of uncertainty e.g., p-value, confidence interval, etc. 67
  68. 1. 2. 3. Types of Missing Data MISSING COMPLETELY AT RANDOM (MCAR) - Values in a data set can miss completely at random (MCAR) if the events that lead to any particular data-item missing are independent both of observable variables and of unobservable parameters of interest, and occur entirely at random. MISSING AT RANDOM (MAR) - it is an alternative, and occurs when the missing-ness is related to a particular variable, but it is not related to the value of the variable that has missing data. An example of this is accidentally omitting an answer on a questionnaire. MISSING NOT AT RANDOM (MNAR) - This is data that is missing for a specific reason (i.e. the value of the variable that is missing is related to the reason it is missing). An example of this is if certain question on a questionnaire tend to be skipped deliberately by participants with certalå
  69. Methods of Handling Missing Data Deletion rnethods Imputation methoc% Listwise methods Pairwise deletion Variable deletion Single imputation Mean imputation Conditional imputation Case mean imputation Regression imputation Last observation carried forward Worst case imputation Interpolation Best case imputation EM imputation Multiple imputation 69
  70. List wise deletion • In this case, rows containing missing variables are dele Gend er Ma n power 33 29 32 Sales 343 332 326 237 70
  71. Pair wise deletion In this case, only the missing observations are ignored and analysis is done on variables present Gender Manpower 33 29 26 32 Sa les 343 280 332 272 326 253 297 71
  72. Variable deletion Variable deletion involves dropping variables missing values on an case - by-case basis Su bje ct 1 2 3 4 7 8 10 Age 22 34 20 22 Missing 21 33 G rider rvl r•vl r•vl I acorn e M issir•g SIS Zoo S38vaoa ssavaaa S24waoo aaa 22 aaa with 72
  73. Mean imputation Replace missing values with the mean of that variable Missing values replaced Variables by means Respondent 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 73
  74. Last Observation Carried Forward Imputes the missing value as a value on the same outcome the most recent time it was observed 10
  75. Interpolation Use interpolation to fill in missing values Useful for longitudinal datasets 75
  76. Worst case and Best case imputation Worst case replaces a missing value with the worst case scenario for a categorical outcome • Best case replaces a missing value with the best case scenario for a categorical outcome 76
  77. Multiple imputation Multiple imputation is quickly becoming the standard" approach to handling missing values • Computationally complex 'gold Incomplete Data Imputed Data Analysis Results Pooled Results
  78. Noisy Data Noise: Random error, Data Present but not correct. Data Transmission error Data Entry problem Removing noise Data Smoothing (rounding, averaging within a window). Clustering/merging and Detecting outliers. • Data Smoothing First sort the data and partition it into (equi-depth) bins. Then the values in each bin using Smooth by Bin Means, Smooth by Bin Median, Smooth by Bin Boundaries, etc. 78
  79. Noisy Data (Binning Methods) Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 * Partition into (equi-depth) bins: - Bin 1: 4, 8, 9, 15 - Bin 2: 21, 21, 24, 25 - Bin 3: 26, 28, 29, 34 * Smoothing by bin means: - Bin 1: - Bin 2: 23, 23, 23, 23 - Bin 3: 29, 29, 29, 29 * Smoothing by bin boundaries: - Bin 1: 4, 4, 4, 15 - Bin 2: 21, 21, 25, 25 - Bin 3: 26, 26, 26, 34
  80. Noisy Data (Clustering) Outliers may be detected by clustering, where similar values are organized into groups or "clusters". • Values which falls outside of the set of clusters may be considered outliers.
  81. Outliers Outliers are data points that don't fit the pattern of rest of the numbers. They are the extremely high or extremely low values in the data set. Outliers are different from the noise data. Noise is random error or variance in a measured variable Noise should be removed before outlier detection An outlier may indicate an experimental error, or it may be due to variability in the measurement. 81
  82. 1. Types of Outliers Global Outlier (or point outliers): If an individual data point can be considered anomalous with respect to the rest of the data, then the datum is termed as a point outlier. For example, Intrusion detection in computer networks 10000 8000 6000 4000 2000 20 40 60 80 100 82
  83. 2. Types of Outliers Contextual outliers: If an individual data instance is anomalous in a specific context or condition (but not otherwise), then it is termed as a contextual outlier. Attributes of data objects should be divided into two groups Contextual attributes: defines the context, e.g., time & location Behavioral attributes: characteristics of the object, used Monthly Temp
  84. 3. Types of Outliers Collective outliers: If a collection of data points is anomalous with respect to the entire data set, it is termed as a collective outlier. There are three approaches for outlier detection:
  85. MODULE 1 - PART 4 DATA VISUALIZAfr10N
  86. DATA VISUALIZATION • Data visualization - the process of displaying data (often in large quantities) in a meaningful fashion to provide insights that will support better decisions. • Data visualization improves decision-making, provides managers with better analysis capabilities that reduce reliance on IT professionals, and improves collaboration and information sharing.
  87. • Benefits of Data Visualization Improved decision making Better ad-hoc data analysis Improved collaboration and information sharing Provide self-service capabilities to end users Increased ROI (Return On Investment) Time savings Reduced Burden on IT
  88. Creating Charts in Microsoft Excel Select the Insert tab. Highlight the data. Click on chart type, then subtype. Use Chart Tools to customize. Page Layout Home J Insert PivotTable Table Picture Clip Shapes Formulas Data View SmartArt Name Maths 49 35 Column Line Area Scatter Other Charts Charts 1 2 3 4 5 6 7 Tables Name Physics Amal Be ena 35 Ravi Sa m Tom Vani Illustrations c Chemistry Biolo 47 47 49 English 33 22 32 Malayalam 32 37 41
  89. Column Chart A column chart is a graphic representation of data. Column charts display vertical bars going across the chart horizontally, with the values axis being displayed on the left side of the chart. Column charts can be used to display a wide range of data, n Sales and Expenses $ 140,000 $120,000 S Ico„ooo 580, coo $60 mo $40 mo Jan Feb Mar Apr Mav Sa les Expenses Jun 89
  90. Line Charts A line chart is a graphic representation of data that is plotted using a series of lines. Line charts display lines going across the chart horizontally, with the values axis being displayed on the left side of the chart. Wildlife Population 200 180 160 140 120 100 20 2017 2018 Bears 2019 Do b hins 2021 Whaks 2022
  91. Pie Charts A pie chart is a circular chart that is sliced into sections each section represents a percentage. Accounting Rent 10% Food Utilities Fun Clothes Phone 91
  92. Area Charts An area chart combines the features of a pie chart with those of line charts. Area charts present more information than pie or line charts alone but may clutter the observer's mind with too many details if too many data series are used; thus, they should be used with care. Energy Consumption
  93. Scatter Charts Scatter charts show the relationship between two variables. To construct a scatter chart, we need observations that consist of pairs of variables. House Size vs. Market Value S zx)ooooo S IN.ooooo s 110.00000 SICD.cxx.oo 59000000 $8Q@OOO I.mo 1.200 1.400 1.600 1.800 2.0Ø 2,200 2.400 2.600 Feet 93
  94. Bubble Charts A bubble chart is a type of scatter chart in which the size of the data marker corresponds to the value of a third variable; consequently, it is a way to plot three variables in two dimensions. Stock Comparisons 100 70 so •F itch 150 350
  95. Miscellaneous Excel Charts Chet Sur tace Chart Radar 95
  96. Excel Data Visualization Tools Data bars Color scales • Icon sets 96
  97. Data bars Data bars display colored bars that are scaled to the magnitude of the data values (similar to a bar chart) but placed directly within the cells of a range. Highlight the data in each column, click the Conditional Formatting button in the Styles group within the Home tab, select Data Bars, and choose the fill option and color. Horn* Page Layout Formu12v Data Review View Copv Conditionsl Cell painter Name Ph its Chemist Amal Beena Ravi Sam Tom Vani 44 15 44 50 46 245 Biolo 47 49 50 MathsE ish Mala alam 50 40 22 50 Highhght CeWS RWE • lop,'80ttom RukS • New Rule... Lag Slur Rules Manage Rules.„
  98. Color scales Color scales shade cells based on their numerical value using a color palette. Color-coding of quantitative data is commonly called a heatmap. 8 1 2 3 4 5 6 7 Name Amal Beena Revi Sam Veni Physics Chemistry Biology Math: English Malayalam 44 44 50 50 46 45 34 46 48 50 35 47 26 47 43 50 28 50 22 50 43 35 33 44 22 50 44 32 44 32 37 50 41 45 98
  99. Icon sets • Icon sets provide similar information using various symbols such as arrows or stoplight colors. 44 50 41 45 3 c 0 Name Physics Chemistry Biology Math; English Malayalam 1 2 4 5 6 7 Amal Been; Ravi Sam Tom Vani 45 46 48 50 +47 047 +43 0 28 022 050 32 99
  100. Sorting Data in Excel The sort buttons in Excel can be found under the Data tab in the Sort & Filter group. Select a single cell in the column you want to sort on and click the "AZ down arroW' button to sort from smallest to largest or the "AZ up arrow" button to sort from largest to smallest. You may also click the Sort button to specify criteria for Home From From Access Web Insert Page Layout Review View Formulas Data From From Other Text Sources Existing Connections Connections A I -Y Properties Z Refresh All • Edit Link} Sort Reappty Filter Advanced 100
  101. Filtering Data For large data files, finding a particular subset of records that meet certain characteristics by sorting can be tedious. Excel provides two filtering tools: AutoFilter for simple criteria, and Advanced Filter for more complex criteria. Horne Insen Page Layout Formulas Data Review View From From From From Other Access Web Text Sources ' Get External Data Existing Connections Properties Refresh Edit Connections Clear Reapply Filter Text to Rer Advanced columns Dupl ort & Filter 101
  102. AutoFilter From From From From Other Access Web Text Sources ' Get External Data Existing Connections Properties Refresh All' V) Edit Links Connections Sott Clear Reapply Filter b Advanced Sort & Filter 31 I Name 3 Beena 7 Vani Gender. Female fr Gender D G Physics Chemistry Biology Maths English Malayalam 35 34 26 22 44 28 35 32 32 45 102
  103. Advanced Filter Advanced Filter Action @ tilter the list, in-place O CAP}' to another location List range: Criteria range: Copy to: C] Unique records only 103
  104. Summarizing Data Methods of summarizing data 1. 2. 3. 4. 5. 6. 7. Frequency Relative Frequency Histograms Percentages Quartiles Pivot Tables Cross Tabulation 104
  105. Frequency Distribution A frequency distribution is a summary of counting how often values occur within a range of values. Scores: 11222223333445 Score Frequency 1 2 3 4 5 2 5 4 2 1 105
  106. Constructing a Frequency Distribution in Excel Column B contains the score numbers, D2:E6 is the score bands, and the Frequency column get the value occurs based on the bands. N a Ine Ruby Luc Nicol 4 Neptune 5 James 6 Tom 8 Richard Vann 9 10 Gerald Ellen 11 Robbie 12 13 Maril 14 15 Charley 16 Score 92 34 100 88 76 52 91 30 69 79 82 88 91 73 F rorn 70 so To F 59 69 79 89 100 u en 106
  107. Constructing a Frequency Distribution in Excel Step 3 - Then press Ctrl + Shift + Enter keys together, and all the once. 2 7 9 10 13 frequency distributions for each band are calculated at f. '-FREQCENCMSBS2 SBS16.SES2 Luc Nicol N tune James Tom Richard Vann Gerald EUen Robbie Susan Score 92 76 52 91 30 91 From To 79 89 100 107
  108. Relative Frequency Distributions • Relative frequency is the fraction, or proportion, of the total. • If a data set has n observations, the relative frequency of frequency of category i relative frequency ofcategory i = n We often multiply the relative frequencies by 100 to express them as percentages. A relative frequency distribution is a tabular summary of the relative frequencies of all categories. 108
  109. IC3 Constructing a Relative Frequency Distribution Paint Defect Frequencies a Defect Too Thin 4 Too Thick 5 Runs 6 Orange Peel 7 Wro Color 8 Frequency Relative Frequency Formula used in column C 19.1% :B3/$BS8 11.5% :B4/$BS8 15.3% :B5/$B$8 36.6% :B6/SBS8 17.6% :B7/$BS8 Total 25 15 20 48 23 131 100.0% 1 og
  110. Excel Histogram Tool A histogram is a common data analysis tool in the business world. • It's a column chart that shows the frequency of the occurrence of a variable in the specified range. • a Histogram is a graphical representation, similar to a bar chart in structure, that organizes a group of data points into user-specified ranges. The histogram condenses a data series into an easily interpreted visual by taking many data points and grouping them into logical ranges or bins. 110
  111. How to Make a Histogram in Excel (2018) Bin More Frequency 10 20 Histogram 25 35 30 Bin Frequency 40 More 111
  112. Percentage • ASSIGNMENT 112
  113. Quartiles Assignment 113
  114. PivotTables PivotTable report is an Interactive Excel report which is used to Summarize, Analyze, and Explore data Pivot Tables are great Tools for Comparing Data using Cross-tabulation Home PivotTable able les Inse rt Picture Page Layout Clip Shap es SmartArt S Illustrations 114
  115. 1. 2. 3. 4. 5. 6. Steps to create PivotTable Click any single cell inside the data set On the Insert tab, click PivotTable Create PivotTable dialog box appears Excel automatically selects the data for you. The default location for a new PivotTable is New Worksheet Click OK Drag Product Field to the Row Labels area Drag Amount Field to the Values area Drag Country Field to the Report Filter area MS Excel: PivotTable & PivotChart Reports 115
  116. Cross Tabulation Assignment 116