Data analysis

You need to use excel when appropriate with the due equations that the the supervisor makes sure that i already used the due procedures and not the outcome directly

2022-2

Program: EMIHM Course name and

No.:

S M 9115 Data Analytics

for Decision Making

Assessment title: Assessment One (30%) Type: Multiple Choice Questions

Faculty: Dr. Ahmed Bakri Deadline April 30, 2023 23:59

Dear Students,

Please choose the best answer, then submit your responses on an Excel file.

Show your answers on the Excel file (where needed).

Please rename the file with your name.

Good Luck.

Problem 1 (5%)

Please indicate whether the following statements are true or false:

1. A sample size should not exceed 100 observations, otherwise it will be called a

population.

a. True

b. False

2. The difference between the midpoints of two consecutive classes is equal to the number

of classes.

a. True

b. False

3. The line segments in a cumulative frequency polygon can be either increasing or

decreasing depending on the given data.

a. True

b. False

4. The variance is considered the most accurate measure of dispersion for distribution

comparison because it is calculated using the squared values.

a. True

b. False

5. In a group of 70 scores, if the largest score is increased by 20 points the mean of the

scores will increase by 3.5 points.

a. True

b. False

Problem 2 (15%)

Choose the best answer:

6. Which of the following represents a sample?

a. Number of cups of coffee served at Starbucks Marbella

b. Total registered voters in Spain

c. All the Colombians working abroad

d. None of the above

7. Fifty mouses were chosen from a shelter containing 500 animals to test a new vaccine.

What is the sample?

a. The 50 selected mouses

b. The 500 animals in the shelter

c. The 550 animals

d. All the mouses in the shelter

8. Which of the following is a discrete variable?

a. Depth of the pool measured in meters

b. Numbers of newborn kittens

c. Number of hours spent on social media

d. None of the above

9. The amount of “dollars” stuck in non-US banks is a:

a. Quantitative discrete variable

b. Qualitative discrete variable

c. Quantitative continuous variable

d. Qualitative continuous variable

10. Identify the scale of measurement for the following categorization of clothing: hat,

shirt, shoes, pants.

a. Nominal level of data

b. Ordinal level of data

c. Ratio level of data

d. Interval level of data

11. As part of a test preparation course, students are asked to take a practice version of the

Graduate Record Examination (GRE). This is a standardized test, and scores can range

from 200 to 800. The appropriate scale of measurement is:

a. Nominal

b. Ordinal

c. Interval

d. Ratio

12. Children in elementary school are evaluated and classified as non-readers (0), beginning

readers (1), grade level readers (2), or advanced readers (3). The classification is done to

place them in reading groups.

a. Ratio

b. Nominal

c. Interval

d. Ordinal

Problem 3 (25%)

A sample of 20 women were asked about the symptoms they felt after taking the COVID19

vaccine. Below are their responses:

Headaches Stroke Fever Nausea Tiredness Nausea

Headaches Tiredness Cough Fever Tiredness Cough

Skin Rash Tiredness Cough Fever Nausea Tiredness

Cough Headaches

13. The “Symptoms” is a ___________ variable, thus it should be organized into a

___________.

a. Qualitative, frequency distribution

b. Qualitative, frequency table

c. Quantitative, frequency distribution

d. Quantitative, frequency table

14. Based on the above data, the relative frequency of “tiredness” is:

a. 4

b. 5

c. 0.2

d. 0.25

15. If two more women were added to the survey and if they both had a stroke after taking

the vaccine, the relative frequency of this symptom would be:

a. 0.1

b. 0.15

c. 0.136

d. 0.09

16. Based on the above data, the angle that corresponds to the “Fever” category is:

a. 0.15

b. 54

c. 10.8

d. 58

17. The best graphical presentation for this data is:

a. Bar Graph

b. Histogram

c. Frequency polygon

d. Cumulative histogram or cumulative frequency polygon

Problem 4 (25%)

The raw data below represents the rate per hour of a sample of doctors in Paris. This data

needs to be represented in a frequency distribution.

113 189 186 174 103 125 41 81 47 156 37 89

90 141 126 28 58 172 75 61

18. What interval for each class do you suggest?

a. 5

b. 30

c. 33

d. 32

19. The relative frequency of doctors who earn between 160 USD and 193 USD per hour

is:

a. 0.2

b. 20%

c. 0.1

d. 0.25

20. The percentage of doctors who earn less than 127 USD per hour is:

a. 10%

b. 20%

c. 70%

d. 80%

21. The percentage of workers who earn more than 160 USD per hour is:

a. 80%

b. 20%

c. 10%

d. 16

22. The first point of a cumulative frequency polygon that represents this data is:

a. X = 61 and Y = 5

b. X = 28 and Y = 5

c. X = 28 and Y = 0

d. X = 44.5 and Y = 0

Problem 5 (30%)

The numbers that follow represent the number of paint gallons (in thousands) produced

each month by a sample of 10 companies.

7 20 10 4 18 12 7 14 6 22

23. The mean number of paint gallons is:

a. 7

b. 12

c. 120

d. 13.33

24. The mode of this distribution is:

a. 15

b. 2

c. 7

d. There is no mode.

25. The median of this distribution is:

a. 10

b. 11

c. 12

d. 15

26. The distribution of data for the number of paint gallons produced is:

a. Positively skewed.

b. Negatively skewed.

c. Symmetrical

d. Cannot be determined.

27. The range is:

a. 26

b. 18

c. 15

d. 29

28. The variance of this distribution is:

a. 35.8

b. 5.98

c. 39.78

d. 6.31

29. The standard deviation of this distribution is:

a. 35.8

b. 5.98

c. 39.78

d. 6.31

30. Which of the dispersion measures is considered the most accurate for distribution

comparison?

a. The range because it is the simplest one.

b. The standard deviation because it includes all variables.

c. The variance because it is calculated using the squared values.

d. All measures are equally accurate.

,

2022-2

Program: EMIHM Course name and

No.:

S M 9115 Data Analytics

for Decision Making

Assessment title: Assessment Two (35%) Type: Practical

Faculty: Dr. Ahmed Bakri Deadline April 30, 2023 23:59

Dear Students,

Please solve the following problems on Excel (where needed) and kindly

upload the excel file on Moodle when done.

Please rename the file with your name.

Good Luck.

Problem 1(12 pts)

Tell whether each variable, in the below statements, is quantitative or qualitative, then specify

the level of measurement (Nominal, Ordinal, Interval, or Ratio) for each variable:

Variable Type of Variable Level of Measurement

Room temperature (0F) in

manager’s office.

The number of hours spent

on your laptop while

studying for the Data

Analytics exam

The color of walls in Al

Campo Hotel rooms

Student’s Rating

(Evaluation) of the Data

Analytics professor

The most popular Resort in

Marbella is Puente Romano

Number of employees

working at Moch restaurant

in Marbella

Problem 2 (18 pts)

A restaurant has collected data on the preferred cuisine types of its customers. The data

collected from a sample of 30 customers is as follows:

Italian, Mexican, Chinese, Indian, Italian, Mexican, Italian, Italian, Indian, Chinese,

Mexican, Italian, Italian, Chinese, Indian, Mexican, Italian, Mexican, Chinese, Indian,

Chinese, Italian, Indian, Mexican, Chinese, Italian, Italian, Indian, Chinese, Mexican

1. Construct a frequency table showing the categories, frequencies, and relative

frequencies.

2. Is data Qualitative or Quantitative?

3. Based on the given data, which cuisine type is most preferred among the restaurant

customers? Can you justify your answer using percentage relative frequency?

4. Represent your data using a pie or bar chart.

Problem 3 (35 pts)

The hotel manager of a luxury hotel is interested in analyzing the room occupancy rate for a

given month. The manager has collected data on the number of rooms occupied per day and

wants to represent this data in a frequency distribution.

1. What is the number of classes?

2. What is the class interval?

3. Construct a frequency distribution showing classes, class frequency & cumulative

frequency.

4. Show your data using a graphical chart.

Data:

36, 41, 45, 52, 55, 58, 62, 66, 69, 70, 71, 72, 74, 76, 78, 79, 81, 83, 86, 88, 92, 95, 98, 101,

104, 107, 110, 112, 115, 118, 121.

Problem 4 (35 pts)

A hotel manager wants to analyze the revenue generated by the hotel’s different room types

during the last quarter. The revenue data for 10 different room types are given below:

Room Type A: $100,000

Room Type B: $120,000

Room Type C: $90,000

Room Type D: $80,000

Room Type E: $130,000

Room Type F: $110,000

Room Type G: $85,000

Room Type H: $95,000

Room Type I: $115,000

Room Type J: $125,000

1. Calculate the mean revenue generated by the hotel’s different room types during the

last quarter.

2. Calculate the median revenue generated by the hotel’s different room types during the

last quarter.

3. Identify the mode of the revenue data.

4. Compare the mean, median, and mode of the revenue data. Comment on the skewness

of the distribution.

5. Calculate two measures of dispersion of your choice for this population or sample.

,

2022-2

Program: EMIHM Course name and

No.:

S M 9115 Data Analytics

for Decision Making

Assessment title: Final Assessment (35%) Type: Practical

Faculty: Dr. Ahmed Bakri Deadline April 30, 2023 23:59

Dear Students,

Please solve the following problems on Excel (where needed) and kindly

upload the excel file on Moodle when done.

Please rename the file with your name.

Good Luck.

Problem 1 (35 pts)

A hotel manager wants to analyze the nationality of the guests who stayed at the hotel during

the last month. The data shows that approximately 20% of the guests were from Germany.

Suppose that you choose randomly 50 guests:

1. Find the probability that exactly 10 guests are from Germany.

2. Find the probability that at least 15 guests are from Germany.

3. Find the probability that at most 5 guests are from Germany.

4. Find the probability that more than 30 guests are from Germany.

5. What is the expected number of guests from Germany who stayed at the hotel during

the last month?

6. Calculate the variance and the standard deviation of this binomial distribution.

Problem 2 (35 pts)

The manager of a hotel chain is interested in knowing whether there is a relationship between

the number of positive reviews a hotel receives online and its occupancy rate. The manager

collected data from 10 hotels and recorded the number of positive reviews each hotel received

on a popular travel website and the corresponding occupancy rate (in percentage) for the

same period. The data is shown below:

Hotel Positive Reviews Occupancy Rate%

45 72

65 82

55 78

80 89

75 86

70 81

90 92

50 76

60 77

85 88

1. Determine the dependent and independent variables.

2. Calculate the correlation coefficient and comment.

3. Determine the regression equation �̂� = 𝑎 + 𝑏𝑋.

4. Give a brief interpretation for the values of “a” and “b”.

5. If the hotel had no positive reviews, what would the occupancy rate be?

Problem 3 (30 pts)

The revenue manager of a hotel chain wants to analyze the distribution of room rates for a

particular hotel location. Based on historical data, she estimates the following probability

distribution for the daily room rates:

Probability Room Rate

30% $100

40% $150

20% $200

5% $250

5% $300

Based on this distribution, what is the coefficient of variation for the room rates at this hotel

location?