# Which bridges should you install the sensors on to get the best prediction of overall traffic?

Learning Goal: I’m working on a python discussion question and need an explanation and answer to help me learn.Up until now, we have given you fairly detailed instructions for how to design data analyses to answer specific questions about data — in particular, how to set up a particular analysis and what steps to take to run it. In this project, you will put that knowledge to use!Put yourself in the shoes of a data scientist that is given a data set and is asked to draw conclusions from it. Your job will be to understand what the data is showing you, design the analyses you need, justify those choices, draw conclusions from running the analyses, and explain why they do (or do not) make sense.We are deliberately not giving you detailed directions on how to solve these problems, but feel free to come to office hours to brainstorm.ObjectivesThere are two possible paths through this project:You may use data set #1, which captures information about bike usage in New York City. See below for the analysis questions we want you to answer.
You may use data set #2, which captures information about student behavior and performance in an online course. See below for the analysis questions we want you to answer.
After reading the questions for the data set you have chosen to work with, provide a summary statistics table of the variables you will use. If you need to transform a variable (e.g., Precipitation into a Raining or not raining variable), this variable must be included in the table. You can use any appropriate summary statistics (e.g., mean, standard deviation, mode).
Provide a histogram and explain the resulting plot for at least one variable in your dataset
Descriptive statistics should be included in both paths.Path 1: Bike trafficThe NYC_Bicycle_Counts_2016_Corrected.csv gives information on bike traffic across a number of bridges in New York City. In this path, the analysis questions we would like you to answer are as follows:You want to install sensors on the bridges to estimate overall traffic across all the bridges. But you only have enough budget to install sensors on three of the four bridges. Which bridges should you install the sensors on to get the best prediction of overall traffic?
The city administration is cracking down on helmet laws, and wants to deploy police officers on days with high traffic to hand out citations. Can they use the next day’s weather forecast (low/high temperature and precipitation) to predict the total number of bicyclists that day?
Can you use this data to predict whether it is raining based on the number of bicyclists on the bridges (hint: The variable raining or not raining is binary)?
Path 2: Student performance related to video-watching behaviorbehavior-performance.txt contains data for an online course on how students watched videos (e.g., how much time they spent watching, how often they paused the video, etc.) and how they performed on in-video quizzes. readme.pdf details the information contained in the data fields. There might be some extra data fields present than the ones mentioned here. Feel free to ignore/include them in your analysis. In this path, the analysis questions we would like you to answer are as follows:(For Q2,Q3: You will run prediction algorithm(s) for ALL students for ONE video, and repeat this process for all videos. The function get_by_VidID in the helper file MiniProjectPath2 will help you in this process.)How well can the students be naturally grouped or clustered by their video-watching behavior (fracSpent, fracComp, fracPaused, numPauses, avgPBR, numRWs, and numFFs)? You should use all students that complete at least five of the videos in your analysis. Hints: Would KMeans or Gaussian Mixture Models be more appropriate? Consider using both and comparing.
Can student’s video-watching behavior be used to predict a student’s performance (i.e., average score s across all quizzes)?(hint: Just choose 1 – 4 data fields to create your model. We are looking at your approach rather than model performance.)
Taking this a step further, how well can you predict a student’s performance on a particular in-video quiz question (i.e., whether they will be correct or incorrect) based on their video-watching behaviors while watching the corresponding video? You should use all student-video pairs in your analysis.
What to turn inYou must turn in two sets of files by pushing them to your team’s Github repository:report.pdf: A project report, which should consist of:A section with the names of the team members (maximum of two), your Purdue username(s), and the path (1 or 2) you have taken. Use the heading “Project team information”.
A section stating and describing the dataset you are working with. Use the heading “Descriptive Statistics”.
A section describing the methods of data analysis you chose to use for each analysis question (with a paragraph or two justifying why you chose that method and what you expect the analysis to tell you). Use the heading “Approach”.
All Python .py code files you wrote to complete the analysis steps. In addition, the report must be a PDF file. See the template provided to guide yourself on the format of the project. Not complying with the instructions might result in a deduction of points.Up until now, we have given you fairly detailed instructions for how to design data analyses to answer specific questions about data — in particular, how to set up a particular analysis and what steps to take to run it. In this project, you will put that knowledge to use!Put yourself in the shoes of a data scientist that is given a data set and is asked to draw conclusions from it. Your job will be to understand what the data is showing you, design the analyses you need, justify those choices, draw conclusions from running the analyses, and explain why they do (or do not) make sense.We are deliberately not giving you detailed directions on how to solve these problems, but feel free to come to office hours to brainstorm.

Pages (275 words)
Standard price: \$0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back