Order this Assignment Now: £149 VALID THRU: 01-Feb-2025
Assignment Briefs
12-25-2024
You are free to choose the application domain (e.g. tourism, finance, transport) and the datasets that you wish to work with. We recommend that you choose your application domain based on your experience and interest
You will carry out a tiny data science project from start to finish and write this up as a research paper with a (PDF file of a) supporting Notebook. The aim is to answer analytical questions using appropriate methods . Don`t just use methods for the sake of it!
Approach
We suggest you approach the project in the following way.
Identify the application domain and datasets
You are free to choose the application domain (e.g. tourism, finance, transport) and the datasets that you wish to work with. We recommend that you choose your application domain based on your experience and interest. You will need to ensure that data and questions are sufficiently complex, but also achievable within the time frame of the project. Combining one or more datasets may help ensure the project is adequately complex.
Identify well-motivated analytical questions
Identify some well-motivated analytical questions for your data. Ask yourself what you want to find out from the data. Don’t just predict something - your goal is to do data analysis and identify findings of relevance to your application domain. This will help inform and justify your analysis strategy and findings.
Plan
Explore your data and develop an analysis strategy. Your strategy should be designed to address the tasks you identified and designed to fulfil your stated objectives in your selected problem domain. This should be informed by an initial investigation of the data sources and the characteristics of the data. Plan which data processing steps you will need to perform, how you will transform the data to make it useable, which data analysis algorithms you could and what sorts of observations these may lead to. Remember that the data analysis process is highly agile; you will find yourself iterating through several different methods and changing your initial plans frequently.
Analysis
This is the core of your project, which will include preparing the data for the analysis, carrying out your analysis/modelling, validating your results and communicating observations. It will involve many iterative steps. We expect to see evidence of:
Data preparation to the extent necessary to prepare useful and robust data to work with.
Data derivation to support your analysis. This may include feature engineering. Be creative.
Construction of models that help explain the underlying phenomena. These will help you to explain the data better and/or perform predictions. Models include those using regression, classification or clustering methods .
Validation of results using formal methods.
Findings and reflections
Consider the extent to which your results answer your analytical questions. You may need to go back and do some more investigation, but don’t worry if you’re not able to or the results are inconclusive. Reflect on the significance of your findings and ideas for further work.
Formative feedback on your plan
We will give you early feedback on your plan if you submit by 3rd November . We suggest you provide the following:
Title
Application domain and datasets , including a preliminary look at the variables and an assessment of how suitable they are
Well-motivated analytical questions. As indicated above, give an motivation that relates to your application domain (don`t just predict something for the sake of it).
What is your plan? Think about the likely steps and methods you will need to use throughout the whole analytical process from data for findings. Show how these are appropriate for your analytical goals
Specific questions . Think about any specific questions you might have on your plan.
Optionally, you may optionally upload a PDF of an initial exploratory notebook, but only if it will help us give feedback.
Introduction (max 300 words; 10% of the mark ) : Provide the context, what you will investigate and the motivation. Marked on clearness of the context and convincingness of motivation.
To pass: There must be some reference to an application for which the analysis would help with. 45%: The context, motivation and aims align to some degree. 55%: This work has potential to tell us something relevant for the context and motivation. 65%: This work has potential to tell us something useful for the context and motivation. 75%: This work clearly has potential to tell us something useful and new for the context and motivation 85%: This work clearly has potential to tell us something useful, new, and actionable for the context and motivation
Analytical questions and data (max 300 words; 15% of the mark ) : Describe the analytical questions that you want to answer and why they might be interesting. Marked on how well research questions align with the motivation and on the suitability of the data.
To pass: The research questions must be relevant to the what it written in the introduction and that data must be relevant to these. 45%: The research questions are answerable, the aspects of the data are suitable and it is all somewhat related to the aims of the work. 55%: The research questions are relevant , are answerable , and the data are at least partially suitable . 65%: The research questions align well to the introduction, are answerable , and the data are (on the whole) suitable . 75%: The research questions are likely to realise the aims , go beyond simple descriptive questions , are clearly answerable , the data are suitable , and some potential limitations expressed. 85%: The research questions are highly likely to realise the aims, go beyond descriptive questions, are clearly answerable, the data are highly suitable , and good potential limitations expressed.
Analysis (max 1000 words; 60% of the mark ) : Describe and justify the highlights of your analysis including all stages (data preparation/derivation, applying analytical methods/modelling, interpretation, and validation, focussing on the more sophisticated aspects. We will expect to see these evidenced in your notebook. Justify why you have used the methods. Marked on coverage of the data science process, appropriateness of analytical decisions, the alignment of the approach and justification with the goals, and the alignment with the supporting notebook.
To pass: Analysis at all or most stages (data preparation/derivation, applying analytical methods/modelling, interpretation, and validation) must be covered in the notebook to some degree. 55%: All stages are described and on-the-whole suitable, with perhaps some poor decisions and unclear details. The notebook implements this to some degree. 65%: All stages are described, justified , mostly suitable and have good interpretations , but some details are unclear and some of the justifications are poor. The notebook implements this well. 75%: All stages are described, justified, suitable and interpreted with good reasoning . The notebook implements this very clearly. 85%: All stages are described, justified and interpreted correctly and extremely well . The design goes beyond standard approaches and is informed by the work. The notebook implements this very clearly and with a clear structure .
Findings, reflections and further work (max 600 words; 15% of the mark ) : State your findings, reflect on the extent they answer your original research questions and reflect on how useful and actionable they may be. You may want to support your arguments with figures and charts. Reflect on the suitability of the data and analytical steps for answering your questions, limitations and possible further lines of enquiry. Marked on how well you apply your analytical results to the original analytical questions and application domain.
To pass: Findings must follow from the described analysis. 55%: Findings are identified and described though they may not all follow from the analysis section and may not be so relevant to the aims. There is some brief reflection on the suitability and limitations of the data and/or analytical methods for achieving the goals. 65%: Findings are identified and clearly described, generally follow from the analytics and help achieve the work`s aims. There is some reflection on the suitability and limitations of the the data and methods for achieving the goals. There are ideas about further lines of enquiry. 75%: Findings are are clearly described, clearly address the aims of the work, and are clearly derived from the analysis. There is good reflection on the suitability and limitations of the the data and methods for achieving the goals. There are ideas about further lines of enquiry. 85%: The findings are sophisticated , clearly addressing the aims of the work, and clearly derived from the analysis. There is reflection on the suitability and limitations of the data and methods for achieving the goals, and generalisability to the application domain. There are ideas about further lines of enquiry.
References. Include all the references for your work.
Word counts. Include a list of the word counts for each section. You may be penalised if you exceed the word counts.
Ideas for projects
A challenging option is Yelp Data Challenge provides datasets on business reviews, social network activities, check-ins, etc.
For a financial focus, here is a good resource on industries and companies .You can enrich your analysis by adding data and new perspective from data repositories of World Bank and IMF .
data.gov.uk is a great place to grab datasets. There`s a range in quality, where some is too simple (highly aggregated tables) and some is rather unstructured so might take time to prepare. But generally the data sets are relevant and updated.
For a more local focus, London Datastore is a great place to start. You might even link some of these datasets to the Yelp ones.
Airline Data Project is an interesting initiative by MIT where they gathered several sources of information related to the aviation industry. The datasets can be accessed here . And there is even a glossary to get you introduced to the domain .
Visualizing.org has a nice collection of data resources which might help you to find relevant domains and datasets.
Kaggle competitions are a great resource to find relevant data sources and problem descriptions. However, if sample solutions for a competition is released, you need to clearly cite any ideas you borrow from the solutions and justify how your solution differs from theirs.
For biomedical domains, you can check https://grand-challenge.org/challenges/ where many biomedical data sets, some with labels and associated questions, can be obtained. Also check the from the US National Library of Medicine .
The British Library provides access to a large number of databases .
Python and Python notebook. We expect you to present you analysis in a Python/Jupyter notebook. However, feel free to utilize supplementary tools, computational methods, or software to enrich your analysis.
Analysis not yielding good findings. Not all analysis leads to the observations and findings you expect or wish for. If this happens, do not panic! As long as the methods and plan you have following is reasonable and you can offer explanations as to why the analysis did not give the results you expected, you will not lose marks. Some reasons that lead to poor findings may include issues to do with data, analysis methods and false expectations. Also, consider that showing a lack of relationship is also a finding!
Reference your sources! Clearly reference any resources you use in your project. This should include URLs of websites that have helped you. Remember that passing other people’s work as your own is considered plagiarism. The university and the general academic community take this seriously and have a range of sanctions in cases of plagiarism.
Communicate your findings effectively. It is in your interest to ensure the marker understands your work and its implications easily.
Describe the implications of your findings to your application domain - i.e. its value.
Align the stages of your projects. Ensure that the goals, analysis and findings follow from each other. Link your findings to the results of your analytical steps and your objectives.
General grading criteria
PG: Distinction UG: First class
85-100
A
A+
Outstanding
Introduction : The benefits of this project are clear and could be actionable for the application area. Analytical questions : Analytical questions are sophisticated, demonstrating a strong understanding of how their answers may apply to the application area. Analysis: the approach shows sophisticated and creative flair, combining techniques in well justified and insightful way. Findings/reflections : The findings and their reflections are sophisticated and comprehensive. Overall : An excellent piece of work that shows high levels of novelty, sophistication and understanding and may be of publishable quality.
80-84
A
Excellent
75-79
Very Good
Introduction : There is strong motivation for studying this application area, with good examples as to why. Analytical questions : Analytical questions are sophisticated and go beyond the obvious. Analysis: the approach shows some novelty and sophistication that may involve deriving new relevant features and clearly justified analytical decisions data and questions. Findings/reflections : The findings (whether conclusive or not) clearly follow from the analysis with sophisticated reflections on assumptions made and its limitations. Overall : A very good piece of work that shows some novelty in appropriately adapting the data and methods to meet very good analytical objectives.
70-74
A-
PG: Merit UG: Upper second class
67-69
B
B+
Good
Introduction : There is clear motivation as to why this application area is worth studying which may include some limited examples as to why. Analytical questions : Analytical questions include some that consider associations between variables that relate to the application area/ Analysis: the approach is sound, is clearly influenced by the data and questions and is well-justified. Findings/reflections : The findings clearly follow from the analysis and there is good critical reflection on the data, methods used and what it means for the application area. Overall : A good piece of work that demonstrates ability to carry out analytics and apply them.
64-66
B
60-63
B-
PG: Credit UG: Lower second class
57-59
C
C+
Satisfactory
Introduction : An application area and motivation are identified, but the motivation is weak. Analytical questions : Analytical questions are provided, but are rather simple and generic descriptive questions that only weakly relate to the application area. Analysis: the approach is generic and may have some flaws but is basically sound and all four aspects are mentioned in the report. Findings/reflections : the findings are based on the analysis and there is some critical reflection that is mostly quite generic. Overall : A satisfactory piece of work that meets the module`s learning objectives, but contains omissions and evidence of a lack of understanding.
54-56
C
50-53
C-
PG: Fail UG: Third-class
47-49
D
D+
Poor
Introduction : An application area is provided, but the motivation is very weak. Analytical questions : Analytical questions are provided, but these are generic and do not relate to the application area well. Analysis: the approach is generic and unsuitable in places or lacks logical structure and justification Findings/reflections : the findings do not follow from the analysis and the discussion is largely descriptive and uncritical. Overall : A poor piece of work, showing limited skills and understanding, and/or not following the format required.
44-46
D
40-43
D-
Very Poor
Introduction : Application area or motivation is missing. Analytical questions : Analytical questions are provided, but are generic and do not relate to the application area well. Analysis: the approach lacks logical structure and justification. Findings/reflections : the findings do not follow from the analysis and the text is descriptive and uncritical. Overall : A very poor piece of work with many omissions, showing very limited skills and understanding and/or not following the format required.
PG: Fail UG: Fail
20-40
E
E
0-20
Order this Assignment Now:£149
100% Plagiarism Free & Custom Written, Tailored to your instructions