A new taxi company hired an advertising agency to advertise their services on screens at Times Square in New York, NY. The marketing company was tasked to identify the five best screens for their client. In order to reach the maximum number of potential clients for the new taxi company the criterion they decided to use was the average number of taxi pickups in close proximity to an advertising screen.
The marketing company found two public datasets that they are going to use:
The illustration above (which was created by importing a given dataset to Google Maps) visualizes the locations of the screens.
Large datasets like this one are usually consist of a) data dictionary, a table that lists all the fields in the dataset; b) and the actual dataset in a variety of formats (Excel compatible comma separated values (.cvs), XML or JSON).
Large datasets like this one usually consist of a) data dictionary, a table that lists all the fields in the dataset; b) and the actual dataset in a variety of formats (Excel compatible comma-separated values (.cvs), XML or JSON).
Familiarize yourself with both datasets. Note, that the second dataset files are very large (up to 1 GB).
The ridership data is also given in separate files grouped by taxi companies (e.g. Yellow). Pick a dataset related to any company that services the Time Square area.
With the dataset structures (field names, or dictionaries) in mind, use Word to design a flow chart of the algorithm to describe the process of identifying the top five screens that would be seen most often by the taxi riders.
Note that you don’t need to provide code, and you don’t need to calculate top screens, just provide a pseudo code for the algorithm that would perform that task.
Pseudocode is a somewhat structured description of the steps of an algorithm written in plain English. You may also use variable names to refer to the same data multiple times if needed.