University | University of London (UOL) |
Subject | DSM110: R for Data Science Coursework 2 |
DSM: R for Data Science, Coursework 2, UOL, Singapore: In this coursework assignment you should return to this dataset and perform an appropriate investigation into building a statistical model
Introduction
In the first coursework assignment you were tasked to find an appropriate and interesting dataset and then prepare it for a statistical modelling task. In this coursework assignment you should return to this dataset and perform an appropriate investigation into building a statistical model, predictor or classifier for your dataset. For the purposes of this coursework assignment, we are interpreting ‘model’ liberally to include statistical models, predictors and classifiers. Then you should aim to build as optimal a model as you can and generate sufficient results that you can show how optimal your modelling is. You should write appropriate code to handle the statistical modelling, generate any results, figures and any values needed for tables. Also, you should prepare a short report in the style of an academic paper to explain your data modelling, results, and conclusions.
This assignment is worth 70% of the total mark for this module.
Hire a Professional Essay & Assignment Writer for completing your Academic Assessments
Coding in R for modelling
We would expect the code you submit to handle the following:
1. Loading the data into R.
2. Preprocessing the data for the planned modelling. This may include things such as rescaling variables, creating dummy or aggregate variables.
3. The statistical modelling, model optimisation and benchmarking.
4. All code that produces supporting material for the report such as tables of numbers and figures.
Importantly your code should run in a base install of R Studio. If you have chosen to use libraries or packages that are not installed by default you should make sure to include the appropriate install.packages() commands at the top of the code. You may make use of the modules that were introduced during the course, though these should be installed as needed. The exception is the Keras package. If you are working with neural networks, you may assume that Keras is available for R and that your code does not need to handle installing this. You should not make use of GPU acceleration with Keras.
We require you to submit a dataset with which we can test your code. We ask this dataset to be no more than 100 megabytes and that your code with the submitted dataset have a runtime of no more than 15 minutes. You may have to submit only a sample subset of your dataset, but your code must run correctly to completion on this sample. Your report should be based on the full dataset you used, and your personal work can have any run time you deem appropriate to complete the most appropriate model and model optimisation. If the data in your report and submitted data are different, you must explain in your report how the dataset you used for modelling differs from the sample you submitted. We accept that a sampled dataset may not produce the same results and figures as the data used in your report.
Buy Custom Answer of This Assessment & Raise Your Grades
Report style
Your report should be formatted as per a scientific paper that deals with statistical modelling or machine learning. Your report should have the following sections:
Abstract
A brief 200-word summary of your report highlighting the main result or conclusion.
Introduction
Discuss the context of your work. Use this section to write a brief literature review that describes any prior academic work that has been completed with your dataset or the modelling techniques you use in your study. You should also summarise the statistical methods or algorithms used in your methods. This will demonstrate you understand the methods you are using. Ensure you use appropriate references in this section.
Methods
A clear description of how to complete your analysis and modelling. You should briefly summarise where the data is from and how it was prepared (i.e. briefly summarise in one paragraph how any cleaning and missing values were handled, there is no need to repeat the content of your first coursework).
The bulk of this section should be concerned with describing all the steps in your modelling any related data experiments. There should be sufficient detail here that a motivated reader and R programmer can replicate your modelling. Feel free to use code snippets if they will aid the reader’s understanding. You should also ensure you clearly articulate and justify all the choices in your modelling from dataset selection to the type of modelling and benchmarking you have chosen.
Results
Use figures and tables to present the outcome(s) of your modelling, benchmarking and any related investigations and data experiments. For instance, you may choose to compare how changes in hyperparameters affect the performance of your modelling, you might compare the performance of one or more different types of model/predictor, or you may compare how size or make up of the dataset affects the performance of the model. Your dataset, modelling or project may suggest or lead you to perform other types of analysis.
Discussion & Conclusions
In this section you should summarise your results. The purpose of this section is to synthesise the information in the results section into the new knowledge your study has discovered. You should also comment on the benefits and limitations of your dataset, methods and modelling. You should additionally comment on what future direction you might take your modelling.
Stuck with a lot of homework assignments and feeling stressed ? Take professional academic assistance & Get 100% Plagiarism free papers
Are you studying DSM110: R for Data Science Coursework 2? Our trusted online assignment writing agency in Singapore is here to help! If you’re looking for a cheap dissertation writing service, we offer affordable options to suit your budget. We also provide expert help with report writing, ensuring you get high-quality assistance. Singaporean students can easily pay our expert for the support they need to excel in their courses. Don’t wait—get the help you deserve today!
Tags:- DSM110: Coursework 2 DSM110: R for Data Science Coursework 2
- A2329C Dosage Form Design AY2024 Term 4 – Graded Assignment (Individual Report), Singapore
- ANL312 Text Mining and Applied Project Formulation, End-of-Course Assessment, SUSS, Singapore
- CMM315 Peacebuilding and Security, End-of-Course Assessment, SUSS, Singapore
- HFS351 ECA (End-of-Course Assessment) SUSS : July Semester 2024 – Safety Management and Audit
- HFSY217 ECA (End-of-Course Assessment) SUSS : July Semester 2024 – Emergency Preparedness and Response Planning, Singapore
- NSG3EPN Assignment Two instructions rubric – Contemporary nursing practice :Engagement in Professional Nursing, LTU Singapore
- HFS201 GBA (Group-based Assignment) SUSS: July 2024 – Workplace Evaluation and Design
- Business Accounting & Finance – (VM) – A3 Assignment, UOM, Singapore
- HRM3010S: Managing People At Work, Assignment, UCD, Singapore
- HFS351: Safety Management and Audit, End-of-Course Assessment, SUSS, Singapore
UP TO 15 % DISCOUNT