Improving the P-Median model with capacity and near-far cost allocation in Spopt
Spopt is an open-source Python library for solving optimization problems with spatial data (Feng et al., 2022). Originating from the region module in PySAL (Python Spatial Analysis Library), it is actively evolving to incorporate recently proposed models and techniques aimed at solving problems related to regionalization, facility location, and transportation.
Following the development of the facility location modelling module for spopt in 2021 by Germano, Rongbo Xu participated in the Google Summer of Code programme in 2023, and contributed valuable enhancements to the specific location-allocation model in spopt. These enhancements can be divided into two main components: the implementation of capacity constraints in the current p-median models and the creation of a novel model incorporating near-far cost allocation principles. Notably, the project team comprises Rongbo and five mentors.
In this blog, the project team explains the two improvements on the p-median model developed by Rongbo and shares the future plans for this package.
Part 1: Enhancement - The P-median Model with Capacity Constraints
Background to the project
After years of continuous development, the 'locate' component of the spopt package has successfully delivered numerous essential and classical coverage and location-allocation models. These models include the Location Set Covering Problem (LSCP), Maximal Coverage Location Problem (MCLP), p-median Problem, p-dispersion Problem, and p-center Problem. We also have developed more advanced models such as the backup coverage location Problem with LSCP and included other relevant parameters such as facility capacities and predefined facilities for the coverage models. Nevertheless, the real world presents diverse scenarios and challenges that demand specific requirements and constraints.
The genesis of this project can be traced back to the Institute of Education (IOE) at University College London (UCL), where there was a pressing need to enhance their student placement process. In this particular instance, the existing 'p-median' module within spopt proved inadequate. The challenge at hand necessitated the creation of a capacitated variant of the traditional P-median Problem. This variation explicitly takes into account not only the locations and weights of clients but also the capacity constraints of the facilities poised to serve these clients.
In accordance with the specifications provided by UCL IOE, Rongbu initially devised the capacitated p-median model. UCL IOE faces the task of placing approximately 800 students twice a year, and they have a pool of roughly 500 schools available for placements. The primary goal is to minimize the cumulative travel time for all students by using public transport. This objective is contingent upon ensuring that each student is assigned to a school offering instruction in their respective subject, all while accounting for the capacity and priority level of each school.
We have received support for this project from AGILE initiative, and you can find the corresponding initiative (https://agile-online.org/agile-actions/initiatives/development-of-open-source-python-modelling-package) and paper (Bearman et al., 2023) here.
Results and Conclusions
Once we had implemented the capacitated p-median model, we conducted a series of tests utilizing sample data before applying it to the actual dataset provided by UCL IOE. The outcomes generated by the initial version of the capacitated p-median location-allocation model demonstrated a significant enhancement in the total public transport travel time for students. Specifically, there was a reduction of 1194 minutes in overall public transport travel time for an initial sample of 106 students, translating to an average reduction of 11.3 minutes per student.
As depicted in Figure 1, the analysis reveals that the majority of students experienced a reduction in their travel time when compared to the travel times determined manually by the UCL IOE staff. Some students have a slight increase of their travel time (mostly less than 20 mins), and all the results have been shared with the UCL IOE for further examination. The development of the capacitated p-median problem has been submitted and merged into the spopt package.
Figure 1. The histogram of the difference between optimised travel time and previous travel time
Part 2: Development - The P-median Model with Near-far Cost Allocation
Background to the project
The introduction of the P-median Model with Near-far Cost Allocation represents a notable enhancement in model efficiency, particularly addressing scenarios involving large volumes of data, which have become increasingly prevalent in today's context. Traditional location allocation models typically involve processing extensive information between all the clients and facilities, calculating distances, and then evaluating the optimal solution from a vast pool of potential solutions, sometimes numbering in the thousands or even millions. This often results in excessively lengthy computation times.
The k nearest p-median problem extends the concept of facility location allocation by considering both nearest and non-nearest facilities. In the article by Church (2018), it proposed this new p-median model which can distinguish between near and far facilities, using both explicit and implicit variables for capacity allocations. By integrating this model, the spopt package becomes equipped to deliver heightened precision and efficiency in solving spatial optimization problems, especially those involving extensive datasets and substantial demand volumes.
The notebook introduces the formulation of this new model, the technical details of its implements in spopt package, and a guide on how to use it.
Model and solution approach
This model introduces a clever approach by differentiating between nearby and distant facilities. For instance, at the beginning, only a given number of the closest facilities (set as 5) to each client are considered, which are termed as "near" facilities. If all client demands and constraints can be satisfied, the problem will have an optimal solution right away. However, if the initial facilities can't meet all the demands, we can supplement them with additional nearby facilities.
Mathematically, we create a variable called the placeholder facility decision variable, which is 𝑔𝑖 in the formula, presenting the distant facilities (Church, 2018).
In terms of the solution approach:
- For each client, identify their five closest facilities.
- Run the p-median model only for the selected facilities from Step 1 (referred to as "explicitly assigned" in the paper).
- Check the model outcome. If all 𝑔𝑖 values for clients are equal to 0 and an optimal solution exists, we accept this solution.
- If some 𝑔𝑖 values are more than 0, we need to increase the value 𝑘 for specific clients. Then, rerun the model. Ensure that all 𝑔𝑖 values become 0, indicating an optimal solution.
The default starting value for 𝑘 is 5, or the total number of facilities if it is fewer than 5. Nevertheless, users have the flexibility to customize this value according to their particular requirements.
This model allows us to avoid incorporating the complete set of distances and facilities into the model. Instead, we can focus on integrating smaller segments of these elements and utilize iteration to identify the optimal solution, thereby enhancing computational speed and efficiency.
Model evaluation and validation
To validate the k nearest p-median model, we constructed a synthetic example specifically for testing purposes. Figure 2 provides a visual representation of the spatial distribution of demand and facility points within this synthetic example. We initialized the attributes as follows:
- Each demand point possesses a demand volume of 1.
- The capacity of each facility is set to 1.
- The initial value of 'k' is established at 1.
In this scenario, it becomes evident that the model with 'k' equal to 1 will not yield an optimal solution. This is because the two closely located demand points share the same facility as their nearest one, yet each facility can serve only one client. Consequently, the solution process necessitates proceeding to Step 4, and it becomes necessary to increase the 'k' value for certain demand points to 2.
Figure 2. The scatter plot of the demand points and facilities points in the synthetic example.
The outcomes align with our expectations, and you can further observe them in the following notebook (https://github.com/rongboxu/P-Median-Model-with-Near-Far-Cost-Allocation/blob/main/notebooks/k_nearest_p_median_model.ipynb). We successfully obtained the optimal solution, including the appropriate 'k' value for each demand point. In Figure 3, you can observe the outcome of the allocation process. The green arrows clearly indicate which facility point is assigned to serve each demand point. In the case of the two closely located demand points, one of them is served by the nearest facility (represented by the central red point), while the other is served by its second nearest facility (indicated by the red point with 'y' coordinates of 2).
Figure 3. The allocation results of the synthetic example.
Part 3: Future Plans
Future development on spopt will provide additional models and novel variations of existing models in both region and locate. The capacity constraint and the solution idea of near-far cost allocation may also be implemented on other existing models.
Final thoughts on the project from Rongbo:
“Participating in GSoC has been incredibly rewarding, especially in achieving my goal of introducing new functionality. I find the collaborative nature of open-source coding fascinating and enjoy working with fellow team members to make meaningful contributions! Reflecting on my journey, I remember feeling uncertain at the beginning and gradually overcoming challenges. GSoC has been instrumental in boosting my programming skills, fostering a deep understanding of software development practices, and preparing me for future coding challenges. I'm also truly thankful to my mentors for guiding me in the right direction. As these months have passed, I've successfully implemented the changes I aimed for. Completing this project leaves me feeling happy, satisfied, and confident in both the project's outcome and my personal growth.”
Project Team
GSOC 2023
● Rongbo Xu, Individual, UK
● Dr Levi John Wolf, School of Geographical Sciences, University of Bristol, UK
● Dr James Gaboardi, Geospatial Science and Human Security Division, Oak Ridge National Laboratory, USA
● Prof Sergio Rey, Center for Open Geographical Science, San Diego State University, USA
● Dr Qunshan Zhao, UBDC and Urban Studies, University of Glasgow, UK
● Germano Barcelos, NESPeD-LAB, Universidade Federal de Viçosa, Brazil
AGILE initiative
● Rongbo Xu, University of Glasgow, UK
● Dr Qunshan Zhao, UBDC and Urban Studies, University of Glasgow, UK
● Dr Huanfa Chen, Centre for Advanced Spatial Analysis, University College London, UK
● Dr Nick Bearman, Geospatial Training Solutions and University College London, UK
● Dr Levi John Wolf, School of Geographical Sciences, University of Bristol, UK
● Dr James Gaboardi, Geospatial Science and Human Security Division, Oak Ridge National Laboratory, USA
Reference
Bearman, N., Xu, R., Roddy, P. J., Gaboardi, J. D., Zhao, Q., Chen, H., & Wolf, L. (2023). Developing capacitated p-median location-allocation model in the spopt library to allow UCL student teacher placements using public transport. AGILE: GIScience Series, 4, 1–7.
Church, R. L. (2018). Tobler’s Law and Spatial Optimization: Why Bakersfield? International Regional Science Review, 41(3), 287–310. https://doi.org/10.1177/0160017616650612
Feng, X., Barcelos, G., Gaboardi, J. D., Knaap, E., Wei, R., Wolf, L. J., Zhao, Q., & Rey, S. J. (2022). spopt: A python package for solving spatial optimization problems in PySAL. Journal of Open Source Software, 7(74), 3330.
Files
Latest news
Indoor environment and energy use study
UBDC - in collaboration with Understanding Society, the UK Household Longitudinal Study - is launching a new study to provide linked-data on energy usage and indoor environment.
How has LEZ in Glasgow affected Traffic Flow and Air Quality?
New research exploring the effects of the Low Emission Zone in Glasgow has found that while traffic flow has remained largely unchanged since the Zone was enforced, air quality has significantly improved.
New high-resolution traffic flow dataset from UBDC
New UBDC paper introduces a long-term traffic flow dataset for a concise urban area combined with detailed space and time data