Uncategorized

Data Management and Visualization

Task Description

 

Data sets vary from domain to another. In this coursework, you will select a dataset related to a real-world problem that best suits your area of interest. There are abundant of websites that provide publicly available datasets. A categorised list of datasets from GitHub can be found at https://github.com/caesar0301/awesome-public-datasets. The UCI Data Repository at https://archive.ics.uci.edu/datasets is another long-standing source of benchmark datasets for data analysis research. Kaggle https://www.kaggle.com/datasets has interesting real-world problems and datasets.

 

 

You can select a dataset from the above sources, or another one that is available online. The dataset should be publicly available. The chosen dataset must NOT be any RStudio embedded built-in training dataset (e.g. Iris, ChickWeight, Mtcars, Airquality, etc).  Otherwise, the assignment component will receive a mark of zero.

 

The link to the chosen open accessible dataset needs to be included in your both components submission. Otherwise, 10% of the total marks available for each component (i.e. 100%) shall be deducted from the assessment mark.

 

Please note that, both components of this module are designed to assess the learning outcome of data management and visualization skills by using R. Any predictive analytics methods and machine learning models are out of scope of this module.

 

You have to complete the following stages in this assignment:

 

  1. Import a real life data set.

 

  1. Identify the insights that the data set is potentially can provide.

 

  1. Data exploration and preparation: The nature of the dataset may dictate some data exploration and preparation that can help inform the decision.

 

  1. Perform necessary data manipulation.

 

  1. Perform basic exploratory data analysis.

 

  1. Use appropriate visualisation for the results.

 

  1. Critically evaluate and interpret the results and how they can support business decision making.

 

  1. Reflect on professional, ethical and legal issues in relation to the problem and the data set.

 

 

Component 1 Deliverable – Contribute 40% to the Module Mark

 

Component 1 will assess learning outcomes LO 2, 3, and 4

 

Deadline: Electronic copy of your presentation needs to be submitted on Blackboard by Monday 16/12/2024 before 12:00 noon; group presentation will be made in the seminar sessions in the 12th teaching week (w/c 16th December 2024).

 

What to Hand In

 

  • Online – Each member in the group will be required to submit an electronic copy of your presentation in a Microsoft PowerPoint format that includes visual screenshots ALONG WITH code excerpt screenshots in the slides from your experiments appropriately labelled and commented. The submission in any other format (e.g. PDF, etc) will be marked as zero.

 

 

You need to present your group work, demonstrate your code and results in the seminar sessions in the 12th teaching week (w/c 16th December 2024).

 

Please note that NO turn up and participation in the Group Presentation on the date of the Group Presentation Assessment without approval, 20% of the total marks available for the Group Presentation assessment (i.e.100%) shall be deducted from the Group Presentation assessment mark.

 

 

Component 2 – Contribute 60% to the Module Mark

 

Component 2 will assess learning outcomes LO 1, 2, 3, 4 and 5

 

Deadline: TBC

 

What to Hand In

 

  • A case study individual report maximum of 3000 words that documents the process of the entire case study, including data set, problems, data preparation, transformation, visualization, analysis, and critical evaluation and justification of the findings.
  • Online – file in a Microsoft Word format via Turnitin on Blackboard that includes visual screenshots ALONG WITH code excerpt screenshots in the text from your experiments appropriately labelled and commented. The submission in any other format (e.g. PDF, etc) will be marked as zero.

 

 

Please note that for Component 2 Individual Report, you need to choose a data set which is different than the data set you used for Component 1 Group Presentation. Otherwise, the Component 2 Individual Report will receive a mark of zero.

 

The submission will be done electronically via Blackboard, all deliverables shall be labelled with project name, your student name and university number.

 

The report will be assessed on:

  • understanding of different tools in R
  • review of relevant literature
  • development methodology
  • justification of design decisions
  • consideration of professional, ethical and legal issues

 

The report could broadly include the following sections:

 

  • Abstract
  • Introduction (introduce the data set and its significance of embedded insights)
  • Literature review of related work
  • Data exploration
  • Experiments (data preparation, manipulation, analysis, visualization)
  • Results
  • Discussion, Conclusions and Future Work
  • References
  • Appendix (attach all the scripts in the appendix section)

These are generic section titles, which you may adapt appropriately to the application/problem that is investigated. You may include sections describing modifications of algorithms or developments that are novel and specific to your work.

 
******CLICK ORDER NOW BELOW AND OUR WRITERS WILL WRITE AN ANSWER TO THIS ASSIGNMENT OR ANY OTHER ASSIGNMENT, DISCUSSION, ESSAY, HOMEWORK OR QUESTION YOU MAY HAVE. OUR PAPERS ARE PLAGIARISM FREE.*******."