Module: hybrid_vector_model.hybrid_vector_model

Note

This program was created with the motivation to model the traffic of boaters potentially carrying aquatic invasive species. Nonetheless, the tools are applicable to assess and control any vector road traffic. Due to the initial motivation, however, the wording within this file may at some places still be specific to the motivating scenario:

  • We may refer to vectors as boaters or agents.

  • We may refer to the origins of vectors as origin, source, or simply jurisdiction.

  • We may refer to the destinations of vectors as destination, sink, or lake.

Classes:

BaseTrafficFactorModel(sourceData, sinkData, …)

Base class for traffic factor models.

HybridVectorModel(fileName[, …])

Class for the hybrid vector model.

TransportNetwork(fileNameEdges, fileNameVertices)

A graph representation of the road network.

Data:

CPU_COUNT

Number of processes for parallel processing.

IDTYPE

Type of IDs of origins and destinations.

Functions:

create_distribution_plot(X, observed[, …])

Creates a plot of a given discrete distribution.

create_observed_predicted_mean_error_plot(…)

Create an observed vs.

create_observed_predicted_mean_error_plot_from_files(…)

Creates an observed vs.

mean_relative_absolute_error(prediction, …)

Compute the mean relative absolute error between predictions and observations.

nbinom_fit(data)

Fits a negative binomial distribution to given data

redraw_predicted_observed(fileName1, fileName2)

Redraws predicted versus oberved plots generated earlier.

safe_delattr(obj, attrname)

Deletes and attribute if it exists

class BaseTrafficFactorModel(sourceData, sinkData, postalCodeAreaData, distances, postalCodeDistances)[source]

Bases: object

Base class for traffic factor models.

The traffic factor model is a gravity model yielding factors proportional to the vector flow between different origins and destinations. Admissible models should inherit this class and overwrite/implement its variables and methods.

Objects of this class save the given covariate data as object variables that can later be used to compute a factor proportional to the mean traffic flow between sources and sinks.

Parameters
  • sourceData (Struct[]) – Array containing source covariates

  • sinkData (Struct[]) – Array containing sink covariates

  • postalCodeAreaData (Struct[]) – Array containing population counts of postal code areas

  • distances (double[]) – Shortest distances between sources and sinks

  • postalCodeDistances (double[]) – Shortest distances between postal code areas and sinks

Attributes:

BOUNDS

((float, float)[]) – Reasonable bounds for the parameters (before conversion).

DESTINATION_COVARIATES

((str, type=double)[]) – The names and types of the covariates for the sinks.

LABELS

(str[]) – The names of the parameters in the implemented model.

ORIGIN_COVARIATES

((str, type)[]) – The names and types of the covariates for the sources.

PERMUTATIONS

(bool[][]) – Parameter combinations to be considered when selecting the optimal model.

SIZE

(int) – Maximal number of parameters in the implemented model.

Methods:

convert_parameters(dynamicParameters, …)

Converts an array of given parameters to an array of standard (maximal) length and in the parameter domain of the model.

get_mean_factor(parameters, parametersConsidered)

Returns a factor proportional to the mean traveller flow between the source-sink pair pair or all sources and sinks (if pair is None)

get_mean_factor_autograd(parameters, …)

Same as get_mean_factor(), but must use autograd’s functions instead of numpy.

process_sink_covariates(covariates)

Process sink covariates before saving them.

process_source_covariates(covariates)

Process source covariates before saving them.

BOUNDS = array([], dtype=float64)

((float, float)[]) – Reasonable bounds for the parameters (before conversion).

The length must match the maximal number of parameters of the model (see SIZE).

DESTINATION_COVARIATES = []

((str, type=double)[]) – The names and types of the covariates for the sinks.

LABELS = array([], dtype=object)

(str[]) – The names of the parameters in the implemented model.

The length must match the maximal number of parameters of the model (see SIZE).

ORIGIN_COVARIATES = []

((str, type)[]) – The names and types of the covariates for the sources. If the type is not spcified, it will default to float.

PERMUTATIONS = None

(bool[][]) – Parameter combinations to be considered when selecting the optimal model.

The number of columns must match the maximal number of parameters of the model (see SIZE).

SIZE = 0

(int) – Maximal number of parameters in the implemented model.

convert_parameters(dynamicParameters, parametersConsidered)[source]

Converts an array of given parameters to an array of standard (maximal) length and in the parameter domain of the model.

Not all parameters may be parametersConsidered in the model (to avoid overfitting) Furthermore, some parameters must be constrained to be positive or within a certain interval. In this method, the parameter vector (containing only the values of the free parameters) is transformed to a vector in the parameter space of the model

Parameters
  • dynamicParameters (float[]) – Free parameters. The parameters that are not held constant.

  • parametersConsidered (bool[]) – Which parameters are free? Is True at the entries corresponding to the parameters that are free. parametersConsidered must have exactly as many True entries as the length of dynamicParameters

get_mean_factor(parameters, parametersConsidered, pair=None)[source]

Returns a factor proportional to the mean traveller flow between the source-sink pair pair or all sources and sinks (if pair is None)

~+~

Note

This method MUST be overwritten. Otherwise the model will raise an error.

Parameters
  • parameters (double[]) – Contains the free model parameters.

  • parametersConsidered (bool[]) – Which parameters are free? Is True at the entries corresponding to the parameters that are free. parametersConsidered must have exactly as many True entries as the length of dynamicParameters.

  • pair ((int, int)) – Source-sink pair for which the factor shall be determined. This is the source-sink pair of interest (the indices of the source and the sink, NOT their IDs. If None, the factors for all source-sink combinations are computed).

get_mean_factor_autograd(parameters, parametersConsidered)[source]

Same as get_mean_factor(), but must use autograd’s functions instead of numpy.

This function is necessary to compute derivatives with automatic differentiation.

~+~

To write this function, copy the content of get_mean_factor() and exchange np.[...] with ag.[...]

Note

Autograd functions do not support in-place operations. Therefore, an autograd-compatible implementation may be less efficient. If efficiency is not of greater concern, just use the autograd functions in get_mean_factor already and leave this method untouched.

Parameters
  • parameters (double[]) – Contains the free model parameters.

  • parametersConsidered (bool[]) – Which parameters are free? Is True at the entries corresponding to the parameters that are free. parametersConsidered must have exactly as many True entries as the length of dynamicParameters.

static process_sink_covariates(covariates)[source]

Process sink covariates before saving them.

This method is applied to the sink covariates before they are saved. The method can be used to compute derived covariates

Parameters

covariates (float[]) – Covariates describing the attractiveness of sinks

static process_source_covariates(covariates)[source]

Process source covariates before saving them.

This method is applied to the source covariates before they are saved. The method can be used to compute derived covariates

Parameters

covariates (float[]) – Covariates describing the repulsiveness of sources

CPU_COUNT = 8

Number of processes for parallel processing.

class HybridVectorModel(fileName, trafficFactorModel_class=None, destinationToDestination=False, **printerArgs)[source]

Bases: vemomoto_core.tools.hrprint.HierarchichalPrinter, vemomoto_core.tools.saveobject.SeparatelySaveable

Class for the hybrid vector model.

Brings the model compoents together and provides functionality to fit, analyze, and apply the model.

Parameters
  • fileName (str) – Name (without extension) of the file to which the model shall be saved.

  • trafficFactorModel_class (class) – Class representing the gravity model; must be inherited from BaseTrafficFactorModel.

  • destinationToDestination (bool) – If True, the given origins will be ignored and routes will be sought from destinations to destinations. Note that destination to destination model is not yet implemented to an extent that allows the fit of the gravity model.

  • printerArgs (tuple) – Arguments for the hierarchical printer. Can be ignored ingeneral.

Methods:

check_count_distributions_NB([…])

Checks whether the observed data may be normally distributed.

compare_distributions(stationID, fromID, toID)

Compares distributions of agent obervations via Anderson-Darling tests and comparative plots of the observed and predicted cumulitive mass functions.

compare_travel_time_distributions([saveFileName])

Compares agents’ travel time distributions at different survey locations.

create_budget_plots(minBudget, maxBudget[, …])

Creates plots of the inspection success and price per inspected agent dependent on the invested budget.

create_caracteristic_plot(characteristic, values)

Creates a plot of the characteristics of the optimal inspection policy.

create_quality_plots([worstLabelNo, …])

Creates predicted vs.

create_road_network([fileNameEdges, …])

Creates and preprocesses a route network

create_route_choice_model([redo])

Creates and fits the route choice model.

create_travel_time_model([parameters, fileName])

Create and fit the travel time model.

find_potential_routes([stretchConstant, …])

Searches potential routes of boaters.

find_shortest_distances()

Determines the shortest distances between all considered origins and destinations, and destinations and postal code area centres

fit_flow_model([permutations, refit, …])

Fits the traffic flow (gravity) model.

fit_route_choice_model([refit, guess, …])

Fits the route choice model.

get_PMF_observation_prediction(stationID, …)

Determines the observed and predicted probability mass function for agent counts between specific origin-destination pairs.

get_normalized_observation_prediction([…])

Returns the mean observed and predicted agent counts at survey locations, thereby scaling the values so that they should come from a normal distribution.

get_pair_distribution_property([…])

Computes a property of the distribution of the agent flow between origin-destination pairs.

get_pair_observation_prediction([predictions])

Returnes predicted and observed agent counts by origin-destination pair.

get_station_mean_variance([stationIndices, …])

Returns the mean agent traffic that could be observed at survey locations and the respective vairances.

get_station_observation_prediction([predictions])

Returns observed and predicted agent counts for the inspection locations for which data are available.

investigate_profile_likelihood(x0, …[, …])

Searches the profile likelihood confidence interval for a given parameter.

maximize_log_likelihood([…])

Maximizes the likelihood of the hybrid model.

maximize_log_likelihood_static(…[, …])

Maximizes the likelihood of the hybrid model.

new(fileNameBackup[, …])

Constructs a new HybridVectorModel, thereby reusing saved previous results if possible.

optimize_inspection_station_operation(…[, …])

Computes the optimal locations for agent inspections.

prepare_traffic_factor_model()

Prepares the traffic factor model.

preprocess_survey_data([redo])

Takes the raw survey data and preprocesses them for the model fit.

read_destination_data(fileNameDestinations)

Reads and saves data that can be used to determine the attractiveness of destinations in the vector traffic model.

read_origin_data(fileNameOrigins)

Reads and saves data that can be used to determine the repulsiveness of origins in the vector traffic model.

read_postal_code_area_data(…)

Reads and saves data on postal code regions.

read_survey_data(fileNameObservations[, …])

Reads the survey observation data.

save([fileName])

Saves the model to the file fileName.vmm

save_model_predictions([fileName])

Computes and saves model predictions by origin, destination, origin-destination pair, and inspection location.

save_simulated_observations([parameters, …])

Simulate observation data that would be obtained if the model were True.

set_compliance_rate(complianceRate)

Sets the boaters’ compliance rate (for stopping at inspection/survey locations)

set_infested(originID[, infested])

Chenges the infestation state of an origin with the given ID.

set_origins_considered([considered, infested])

Determines which origins are considered in the model fit.

set_traffic_factor_model_class([…])

Sets the class representing the traffic factor (gravity) model.

simulate_count_data(stationTimes, day, …)

Simulate observation data that would be obtained one one day if the model were True.

test_1_1_regression([minSampleSize, …])

Tests whether the model results are biased.

check_count_distributions_NB(minDataSetSize=20, fileName=None)[source]

Checks whether the observed data may be normally distributed.

Computes p-values for the test with null hypothesis H0: Data are negative binomially distributed. If the p-values are high, we cannot reject the hypothesis and may conclude that the negative binomial distribution is modelling the data appropriately.

Parameters
  • minDataSetSize (int) – It is necessary that parts of the considered data are identically distributed and that the samples are large enough. minDataSetSize sets the minimal size of such a data set.

  • fileName (str) – Names of the files to which plots of the resulting p-values will be saved.

compare_distributions(stationID, fromID, toID, xMax=None, saveFileName=None)[source]

Compares distributions of agent obervations via Anderson-Darling tests and comparative plots of the observed and predicted cumulitive mass functions.

Parameters
  • stationID (IDTYPE) – ID of the survey location where the considered data have been collected. Can be an array.

  • fromID (IDTYPE) – ID of the origin of the considered agents. Can be an array.

  • toID (IDTYPE) – ID of the destination of the considered agents. Can be an array.

  • xMax (int) – Count value up to which the probablity mass function is plotted at least. If not given, the probability mass function will be computed up to the maximal observed count value.

  • saveFileName (str) – File name for plots. If None no plots will be saved.

compare_travel_time_distributions(saveFileName=None)[source]

Compares agents’ travel time distributions at different survey locations.

Conducts likelihood ratio tests evaluating whether multiple distributions are equal and plots the respective best-fitting time distributions at different locations.

Compares not only travel time distributions at different locations but also travel time distributions of local and long-distance travellers.

Parameters

saveFileName (str) – File name for plots.

create_budget_plots(minBudget, maxBudget, nSteps=10, **optim_kwargs)[source]

Creates plots of the inspection success and price per inspected agent dependent on the invested budget.

Parameters
  • minBudget (float) – Minimal budget to be considered.

  • maxBudget (float) – Maximal budget to be considered.

  • nSteps (int) – Number of budget values to be considered.

  • **optim_kwargs (kwargs) – Keyword arguments passed to optimize_inspection_station_operation.

create_caracteristic_plot(characteristic, values, characteristicName=None, valueNames=None, **optim_kwargs)[source]

Creates a plot of the characteristics of the optimal inspection policy.

The probability that an agent chooses a time while the inspection station is operated is plotted against the expected number of infested agents at the inspection stations for all used inspection stations.

Parameters
  • characteristic (callable/str) – Property or argument whose impact on the results shall be studeied. If of type callable, for each entry val of values, callable(self, val) will be executed. If of type str, it will be interpreted as a keyword argument of optimize_inspection_station_operation, which will be set to val.

  • values (arr) – Array of argument values.

  • characteristicName (str) – Name of the property/argument that is studied. Used as axis label in the plot and to generate file names.

  • valueNames (str) – Names of the values of the characteristic (used for the legend).

  • **optim_kwargs (kwargs) – Keyword arguments passed to optimize_inspection_station_operation.

create_quality_plots(worstLabelNo=5, saveFileName=None, comparisonFileName=None)[source]

Creates predicted vs. observed plots.

Parameters
  • worstLabelNo (self) – Number of data points that shall be labelled with their respective IDs. The data points will be labelled in order of the largest deviance between predictions and observations.

  • saveFileName (str) – Name of the file where the plot and the data used to generate it will be saved.

  • comparisonFileName (str) – Name of the file where alternative results are saved. These results will be loaded and plotted for comparison.

  • . todo: (~+~) – Compute mean at stations only: This could speed up the procedure significatnly.

  • timing later. (incorporate) – This could speed up the procedure significatnly.

create_road_network(fileNameEdges=None, fileNameVertices=None, preprocessingArgs=None, edgeLengthRandomization=0.001)[source]

Creates and preprocesses a route network

Parameters
  • fileNameEdges (str) –

    Name of a csv file containing the road network. The file must be a csv with header and the following columns, separated by ,:

    Field

    Type

    Description

    Road ID

    IDTYPE

    ID of the road section

    Vertex from-ID

    IDTYPE

    Starting vertex of the road section

    Vertex to-ID

    IDTYPE

    End vertex of the road section

    Lenght

    float

    Length (or travel time) of the road section

    Survey location for forward traffic

    IDTYPE, optional

    ID of the location where forward traffic can be surveyed

    Survey location for forward traffic

    IDTYPE, optional

    ID of the station where backward traffic can be surveyed

    Survey location for forward and backward traffic

    IDTYPE, optional

    ID of the station where forward and backward traffic can be surveyed

    Destination ID

    IDTYPE, optional

    ID of the destination that can be accessed via this road section

  • fileNameVertices (str) –

    Name of a csv file stating which vertices are origins and destinations. The file must be a csv with header and the following columns, separated by ,:

    Field

    Type

    Description

    Vertex ID

    IDTYPE

    ID of the vertex

    Potential via vertex

    bool

    whether the vertex could be a potential intermediate destination for boaters (should be True by default, but can be set to False for many vertices to reduce computational complexity)

    Vertex type

    int

    type identifier for the vertex; see below
    • 1: origin

    • 2: destination

    • 3: postal code area center (used to determine whether destinations are located in populated areas)

    • other: no specific role for the vertex

  • preprocessingArgs (tuple) – Arguments for preprocessing of the road network. Refer to lopaths.graph.FlowPointGraph for further documentation

  • edgeLengthRandomization (float) – Maximum random perturbation to road lengths. Needed to make it likely that distinct paths have distinct length.

create_route_choice_model(redo=False)[source]

Creates and fits the route choice model.

Parameters

redo (bool) – Whether the route choice model shall be refitted if it has been fitted already. If set to True, the previous fit will be ignored.

create_travel_time_model(parameters=None, fileName=None)[source]

Create and fit the travel time model.

Parameters
  • parameters (float[]) – If given, this will be the parameters of the travel time distribution. If not given, the optimal parameters will be determined via a maximum likelihood fit. See traveltime_model.TrafficDensityVonMises

  • fileName (str) – If given, a plot with the density function of the distribution will be saved under the given name as pdf and png. Do not include the file name extension.

find_potential_routes(stretchConstant=1.5, localOptimalityConstant=0.2, acceptionFactor=0.667, rejectionFactor=1.333)[source]

Searches potential routes of boaters.

For detailed documentation on the arguments, refer to lopaths.graph.FlowPointGraph.find_locally_optimal_paths

Parameters
  • stretchConstant (>=1) – Maximal length of the admissible paths in relation to shortest path. Controls the length of the admissible paths. 1 refers to shortest paths only, infinity to all paths.

  • localOptimalityConstant ([0, 1]) – Fraction of the path that must be optimal. Controls how optimal the admissible paths shall be. 0 refers to all paths, 1 refers to shortest paths only.

  • acceptionFactor ((0,1]) – Relaxation factor for local optimality constraint. Approximation factor to speed up the local optimality check. 0 accepts all paths, 1 performs an exact check. Choose the largest feasible value. 1 is often possible.

  • rejectionFactor ([1,2]) – False rejection factor for local optimality constraint. Approximation factor to speed up the local optimality check. 1 performs exact checks, 2 may reject paths that are admissible but not locally optimal twice as much as required. Choose the smallest feasible value. 1 is often possible.

find_shortest_distances()[source]

Determines the shortest distances between all considered origins and destinations, and destinations and postal code area centres

See TransportNetwork.find_shortest_distances().

fit_flow_model(permutations=None, refit=False, flowParameters=None, continueFlowOptimization=False, get_CI=True)[source]

Fits the traffic flow (gravity) model.

Fits one or multiple candidates for the traffic flow model and selects the model with minimal AIC value.

Parameters
  • permutations (bool[][]) – Each row corresponds to a parameter combination of a models that is to be considered. For each parameter that could be potentially included, the row must contain a boolean value. Do only include parameters included in the traffic factor model. If None, the PERMUTATIONS given in the traffic factor model class will be considered. If this attribute is not implemented, only the full model will be considered.

  • refit (bool) – Whether to repeat the fitting procedure if the model has been fitted earlier.

  • flowParameters (dict) – Dictionary with the keys "parametersConsidered" and "parameters" that provides an initial guess for the optimization or the corresponding solution. "parametersConsidered" contains a bool[] with the considered parameter combination (see permutations); "parameters" contains a float[] with the values for the parameters where flowParameters["parametersConsidered"] is True.

  • continueFlowOptimization (bool) – If True, the flowParameters will be used as initial guess. Otherwise, they will be considered as the optimal parameters.

  • get_CI (bool) – Whether confidence intervals shall be computed after the model has been fitted. Note that no confidence intervals will be computed, if continueFlowOptimization is False.

fit_route_choice_model(refit=False, guess=None, improveGuess=False, disp=True, get_CI=True)[source]

Fits the route choice model.

Parameters
  • refit (bool) – Whether the model shall be refitted if it has already been fitted earlier.

  • guess (float[]) – Guess for the maximum likelihood estimate.

  • improveGuess (bool) – If True, guess will be used as initial guess for the model fit. Otherwise, it will be used as the maximum likelihood estimate.

  • disp (bool) – Whether partial results shall be printed.

  • get_CI (bool) – Whether confidence intervals for the parameters shall be computed after the model has been fitted.

get_PMF_observation_prediction(stationID, fromID, toID, xMax=None, getBestPMF=True, getPureObservations=False)[source]

Determines the observed and predicted probability mass function for agent counts between specific origin-destination pairs.

Parameters
  • stationID (IDTYPE) – ID of the survey location where the considered data have been collected. Can be an array.

  • fromID (IDTYPE) – ID of the origin of the considered agents. Can be an array.

  • toID (IDTYPE) – ID of the destination of the considered agents. Can be an array.

  • xMax (int) – Count value up to which the probablity mass function is plotted at least. If not given, the probability mass function will be computed up to the maximal observed count value.

  • getBestPMF (bool) – If True, a negative binomial distribution will be fitted directly to the observed data. This can be helpful if it is of interest whether the observations come from a negative binomial distribution.

  • getPureObservations (bool) – Whether the pure observed count data shall be returned in addition to the other results.

get_normalized_observation_prediction(minSampleSize=20)[source]

Returns the mean observed and predicted agent counts at survey locations, thereby scaling the values so that they should come from a normal distribution.

Only data obtained between a specific daytime interval will be considered to ensure individual observations are identically distributed and will yield a normal distribution when added together.

minSampleSizeint

The minimal number of survey shifts that must be available before a survey location can be considered. If this value is too low, the resulting values will not follow an approximate normal distribution. If the value is too large, no data will be available to compute the results.

get_pair_distribution_property(dist_property=<bound method rv_generic.mean of <scipy.stats._discrete_distns.nbinom_gen object>>, arg=None, pair=None, shiftStart=None, shiftEnd=None)[source]

Computes a property of the distribution of the agent flow between origin-destination pairs.

Parameters
  • pair (tuple) – Source-sink pair for which the factor shall be determined. This is the source-sink pair of interest (the indices of the source and the sink, NOT their IDs. If None, the factors for all source-sink combinations are computed).

  • dist_property (callable) – The distribution property of interest. Must be properties scipy.stats.nbinom. Can also be a list of properties.

  • arg (float) – Additional argument for dist_property. For example which quantile is desired, if dist_property==nbinom.ppf. Can also be of type float[].

  • pair – The origin-destination pair(s) of interest as (fromIndex, toIndex) respectively. If None, the property will be computed for all origin-destination pairs.

  • shiftStart ([0, 24)) – The end of the time interval(s) for which the parameters shall be computed. Must be given in a 24h format. That is, 14.5 represents 2:30PM. If not given, travel timing will be neglected and the complete daily traffic flow will be considered.

  • stationIndex (shiftEnd) – Index of the survey location to be considered. If not given, route choice will be neglected and the complete traffic flow will be computed.

get_pair_observation_prediction(predictions=None)[source]

Returnes predicted and observed agent counts by origin-destination pair.

Parameters

predictions (struct[]) – If the predictions have been computed earlier, they can be provided as this argument. Otherwise the predictions will be computed anew. Must be the results of get_station_mean_variance().

get_station_mean_variance(stationIndices=None, shiftStart=0, shiftEnd=24, getStationResults=True, getPairResults=False, fullCompliance=False, correctData=False)[source]

Returns the mean agent traffic that could be observed at survey locations and the respective vairances.

The values are returned both for all agents and the agents coming from infested origins only, respectively.

The traffic can be returned either per survey location or per origin-destination pair (assuming that surveys were conducted at the given locations and time intervals).

Parameters
  • stationIndices (int[]) – Indices of the locations for which the traffic estimate is desired. If None, the traffic for all potential survey locations will be returned. The same location can be mentioned multiple times to model multiple inspection shifts on different days.

  • shiftStart ([0, 24)) – The start of the time interval for which the agent counts shall be estimated. Must be given in a 24h format. That is, 14.5 represents 2:30PM. Can also be an array, which then must have the same length as stationIndices.

  • shiftStart – The end of the time interval for which the agent counts shall be estimated. Must be given in a 24h format. That is, 14.5 represents 2:30PM. Can also be an array, which then must have the same length as stationIndices.

  • getStationResults (bool) – Whether the estimates shall be returned by survey location.

  • getPairResults (bool) – Whether estimates shall be returned by origin-destination pair.

  • ~+~

  • todo: (.) – This method can be made much more efficient: choice probabilities and the k values are computed once for each origin-destination pair or inspection location only. That is, we would not need to reconsider teh same location multiple times. This would speed up this method by orders of magnitude.

  • the road (if) – choice probabilities and the k values are computed once for each origin-destination pair or inspection location only. That is, we would not need to reconsider teh same location multiple times. This would speed up this method by orders of magnitude.

get_station_observation_prediction(predictions=None)[source]

Returns observed and predicted agent counts for the inspection locations for which data are available.

Parameters

predictions (struct[]) – If the predictions have been computed earlier, they can be provided as this argument. Otherwise the predictions will be computed anew. Must be the results of get_station_mean_variance().

investigate_profile_likelihood(x0, processedSurveyData, lengthsOfPotentialRoutes, trafficFactorModel, routeChoiceParameters, complianceRate, properDataRate, parametersConsidered, approximationNumber=3, **profile_LL_args)[source]

Searches the profile likelihood confidence interval for a given parameter.

Parameters
  • x0 (float[]) – Maximum likelihood estimate (MLE) of the paramters.

  • processedSurveyData (dict) – Processed survey data, which are computed with preprocess_survey_data()

  • lengthsOfPotentialRoutes (csr_matrix_nd) – For each origin-destination pair the lengths of all potential (i.e. admissible) agent routes.

  • trafficFactorModel (BaseTrafficFactorModel) – Traffic factor model used to determine the strengths of the agent flows between the individual origin-destination pairs.

  • routeChoiceParameters (float[]) – Route choice parameters. The first entry is the probability to select an inadmissible path. The second entry is the exponent controlling the preference for shorter paths. The third entry is the probability that a given suvey location is on a randomly selected inadmissible path.

  • complianceRate (float) – Proportion of agents stopping at survey/inspection stations.

  • properDataRate (float) – Fraction of agents providing inconsistent, incomplete, or wrong data.

  • parametersConsidered (bool[]) – Which parameters are free? Is True at the entries corresponding to the parameters that are free. parametersConsidered must have exactly as many True entries as the length of dynamicParameters. The first two entries must refer to the proportionality constant and the parameter q and will be assumed to be True.

  • approximationNumber (int) – Degree of the Taylor approximation that is to be used. The higher this number the more precise the likelihood will be but also the longer will the computation take. Must be >= 1.

maximize_log_likelihood(parametersConsidered=None, approximationNumber=3, flowParameters=None, x0=None)[source]

Maximizes the likelihood of the hybrid model.

Parameters
  • parametersConsidered (bool[]) – Which parameters are free? Is True at the entries corresponding to the parameters that are free. parametersConsidered must have exactly as many True entries as the length of dynamicParameters. The first two entries must refer to the proportionality constant and the parameter q and will be assumed to be True.

  • approximationNumber (int) – Degree of the Taylor approximation that is to be used. The higher this number the more precise the likelihood will be but also the longer will the computation take. Must be >= 1.

  • flowParameters (float[]) – If given, a model with these parameters will be used and assumed as being the best-fitting model.

  • x0 (float[]) – Will be used as initial guess if given and if flowParameters is not given.

static maximize_log_likelihood_static(processedSurveyData, lengthsOfPotentialRoutes, trafficFactorModel, routeChoiceParameters, complianceRate, properDataRate, parametersConsidered, approximationNumber=3, flowParameters=None, x0=None)[source]

Maximizes the likelihood of the hybrid model.

Parameters
  • processedSurveyData (dict) – Processed survey data, which are computed with preprocess_survey_data()

  • lengthsOfPotentialRoutes (csr_matrix_nd) – For each origin-destination pair the lengths of all potential (i.e. admissible) agent routes.

  • trafficFactorModel (BaseTrafficFactorModel) – Traffic factor model used to determine the strengths of the agent flows between the individual origin-destination pairs.

  • routeChoiceParameters (float[]) – Route choice parameters. The first entry is the probability to select an inadmissible path. The second entry is the exponent controlling the preference for shorter paths. The third entry is the probability that a given suvey location is on a randomly selected inadmissible path.

  • complianceRate (float) – Proportion of agents stopping at survey/inspection stations.

  • properDataRate (float) – Fraction of agents providing inconsistent, incomplete, or wrong data.

  • parametersConsidered (bool[]) – Which parameters are free? Is True at the entries corresponding to the parameters that are free. parametersConsidered must have exactly as many True entries as the length of dynamicParameters. The first two entries must refer to the proportionality constant and the parameter q and will be assumed to be True.

  • approximationNumber (int) – Degree of the Taylor approximation that is to be used. The higher this number the more precise the likelihood will be but also the longer will the computation take. Must be >= 1.

  • flowParameters (float[]) – If given, a model with these parameters will be used and assumed as being the best-fitting model.

  • x0 (float[]) – Will be used as initial guess if given and if flowParameters is not given.

static new(fileNameBackup, trafficFactorModel_class=None, fileNameEdges=None, fileNameVertices=None, fileNameOrigins=None, fileNameDestinations=None, fileNamePostalCodeAreas=None, fileNameObservations=None, complianceRate=None, preprocessingArgs=None, edgeLengthRandomization=0.001, routeParameters=None, considerInfested=None, destinationToDestination=False, restart=False, **restartArgs)[source]

Constructs a new HybridVectorModel, thereby reusing saved previous results if possible.

Parameters
  • fileNameBackup (str) – Name of the file to load and save the model; without file extension

  • trafficFactorModel_class (class) – Class representing the gravity model; must be inherited from BaseTrafficFactorModel

  • fileNameEdges (str) –

    Name of a csv file containing the road network. The file must be a csv with header and the following columns, separated by ,:

    Field

    Type

    Description

    Road ID

    IDTYPE

    ID of the road section

    Vertex from-ID

    IDTYPE

    Starting vertex of the road section

    Vertex to-ID

    IDTYPE

    End vertex of the road section

    Lenght

    float

    Length (or travel time) of the road section

    Survey location for forward traffic

    IDTYPE, optional

    ID of the location where forward traffic can be surveyed

    Survey location for forward traffic

    IDTYPE, optional

    ID of the station where backward traffic can be surveyed

    Survey location for forward and backward traffic

    IDTYPE, optional

    ID of the station where forward and backward traffic can be surveyed

    Destination ID

    IDTYPE, optional

    ID of the destination that can be accessed via this road section

  • fileNameVertices (str) –

    Name of a csv file stating which vertices are origins and destinations. The file must be a csv with header and the following columns, separated by ,:

    Field

    Type

    Description

    Vertex ID

    IDTYPE

    ID of the vertex

    Potential via vertex

    bool

    whether the vertex could be a potential intermediate destination for boaters (should be True by default, but can be set to False for many vertices to reduce computational complexity)

    Vertex type

    int

    type identifier for the vertex; see below
    • 1: origin

    • 2: destination

    • 3: postal code area center (used to determine whether destinations are located in populated areas)

    • other: no specific role for the vertex

  • fileNameOrigins (str) –

    Name of a csv file with (ignored) header and columns separated by ,. The following columns must be present in the specified order

    Field

    Type

    Description

    Origin ID

    IDTYPE

    ID of the origin. Must be coinciding with the respective ID used in the road network

    ORIGIN_COVARIATES

    Columns with the information and types specified in the TrafficFactorModel class. See BaseTrafficFactorModel.ORIGIN_COVARIATES

  • fileNameDestinations (str) –

    Name of a csv file with (ignored) header and columns separated by ,. The following columns must be present in the specified order

    Field

    Type

    Description

    Destination ID

    IDTYPE

    ID of the destination. Must be coinciding with the respective ID used in the road network

    DESTINATION_COVARIATES

    Columns with the information and types specified in the TrafficFactorModel class. See BaseTrafficFactorModel.DESTINATION_COVARIATES

  • fileNamePostalCodeAreas (str) –

    Name of a csv file with (ignored) header and columns separated by ,. The following columns must be present in the specified order:

    Field

    Type

    Description

    Postal code

    IDTYPE

    ID of the postal code area

    Vertex ID

    IDTYPE

    ID of a vertex representing the postal code area (e.g. a vertex at the centre or population centre)

    Population

    int

    Population living in the postal code area. Can be the actual population count or the number in hundrets, thousands, etc. the results just have to be interpreted accordingly

  • fileNameObservations (str) –

    Name of a csv file containing the road network. The file must be a have a header (will be ignored) and the following columns, separated by ,:

    Field

    Type

    Description

    Station ID

    IDTYPE

    ID of the survey location

    Day ID

    IDTYPE

    ID for the day of the survey (e.g. the date)

    Shift start

    [0, 24), optional

    Start time of the survey shift

    Shift end

    [0, 24), optional

    End time of the survey shift

    Time

    [0, 24), optional

    Time when the agent was observed

    From ID

    IDTYPE, optional

    ID of the origin of the agent

    To ID

    IDTYPE, optional

    ID of the destination of the agent

    Relevant

    bool

    Whether or not this agent is a potential vector

    The times must be given in the 24h format. For example, 2:30PM translates to 14.5.

    Missing or inconsistent data will either be ignored (if relevant==False) or incorporated as ‘unknown’ (if relevant==True). All applicable data will be used to fit the temporal traffic distribution. If a survey shift has been conducted without any agent being observed, include at least one observation with origin and destination left blank and relevant set to False.

  • complianceRate (float) – Proportion of agents stopping at survey/inspection stations.

  • preprocessingArgs (tuple) – Arguments for preprocessing of the road network. Refer to lopaths.graph.FlowPointGraph for further documentation

  • edgeLengthRandomization (float) – Maximum random perturbation to road lengths. Needed to make it likely that distinct paths have distinct length.

  • routeParameters (tuple) – Parameters defining which routes are deemed admissible. See find_potential_routes().

  • considerInfested (bool) – If given, only origins with the provided infestation state will be considered to fit the model; see HybridVectorModel.set_origins_considered()

  • destinationToDestination (bool) – True iff destination to destination traffic is modelled, i.e. if the sets of origins and destinations are equal. Note: a gravity model for destination to destination traffic is not yet implemented, but could be added easily.

  • restart (bool) – If True, earlier results will be ignored and the model will be constructed from scratch.

  • **restartArgs (keyword arguments) – The arguments below specify which parts of the model construction process shall be repeated even if earler results are available. If the arguments are set to True, the respective model constrution process will take place (provided the necessary arguments, such as file names, are provided. _____________________________

  • readOriginData (bools) – Read csv with data on boater origins

  • readDestinationData (bool) – Read csv with data on boater destinations

  • readPostalCodeAreaData (bool) – Read csv with population data for postal code regions

  • readRoadNetwork (bool) – Read csv with road network data

  • findShortestDistances (boold) – Find the shortest distances between boater origins and destinations and between destinations and postal code area centres

  • readSurveyData (bool) – Read csv file with boater survey data

  • properDataRate (float) – Rate of complete data (inferred from the data if not given)

  • fitTravelTimeModel (bool) – Fit the model for boaters’ travel timing

  • travelTimeParameters (float[]) – If the traffic time model shall not be fitted but rather created with known parameters, this argument contains these parameters

  • preprocessSurveyData (bool) – Prepare the boater observation data for the fit of the full traffic flow model

  • findPotentialRoutes (bool) – Determine potential routes that boaters might take

  • fitRouteChoiceModel (bool) – Fit the model assigning probabilities to the potential boater routes

  • routeChoiceParameters (float[]) – If the route choice model shall not be fitted but rather created with known parameters, this argument contains these parameters

  • continueRouteChoiceOptimization (bool) – If True, the routeChoiceParameters will be interpreted as initial guess rather than the best-fit parameters for the route choice model

  • preapareTrafficFactorModel (bool) – Prepare the traffic factor model

  • fitFlowModel (bool) – Fit the gravity model for the vector flow between origins and destinations

  • permutations (bool[][]) – Permutations of parameters to be considered. The number of columns must match the maximal number of parameters of the model (see BaseTrafficFactorModel.SIZE and BaseTrafficFactorModel.PERMUTATIONS)

  • flowParameters (float[]) – If the flow model shall not be fitted but rather created with known parameters, this argument contains these parameters

  • continueTrafficFactorOptimization (bool) – If True, the flowParameters will be interpreted as initial guess rather than the best fitting parameters for the flow model

optimize_inspection_station_operation(costShift, costSite, costBound, shiftLength, nightPremium=None, allowedShifts=None, costRoundCoeff=1, baseTimeInv=24, ignoreRandomFlow=False, integer=True, timeout=1000, perturbation=1e-06, fancyRounding=True, full_result=False, extended_info=False, init_greedy=True, saveFile=True, loadFile=True, fileNameAddition='')[source]

Computes the optimal locations for agent inspections.

Maximizes the number of agents who are inspected at least once given a certain budget and other constraints for operation.

The inspections are assumed to be conducted in shifts of given lengths. The best results will be obtained, if the number of possible shifts per day is an integer. However, other shift lengths are possible as well.

The day will need to be discretized. The method will take efforts to make the input match a discretization scheme so that not more time intervals than necessary need to be considered.

Note

This method assumes that MOSEK is installed to solve linear programming problems. (See also the cvxpy documentation.) A different solver could be used as well, but this has to be changed in the source code.

Note

By the time when this document was created, the MOSEK interface of cvxpy did not implement the option to pass an initial condition to the solver. If this feature shall be used (which is recommended), the cvxpy installation needs to be pached. Please copy the files in the subdirectory cvxpy_changes to the locations designated in their headers and replace the original files.

Parameters
  • costShift (float) – Costs per (daytime) inspection shift.

  • costSite (float) – Costs per used inspection site.

  • costBound (float) – Budget for the overall costs.

  • shiftLength (int/float) – If given as int, length of an inspection shift measured in time steps; if given as float, approximate length of an inspection shift.

  • nightPremium ((>=0, [0,24), [0,24))) – Describes additional costs for overnight inspections. Must be given as a tuple (nightcost, start, end), whereby nightcost is the cost for an overnight inspection shift, and start and start and end denote the time interval in which the additional costs are due (24h format). Note that nightcost refers to a complete inspection shift conducted in the time interval of additional costs. In practice, however, the costs will only increased for the fraction of a shift that overlaps with the given time interval of additional costs.

  • allowedShifts (int[]/float[]) – List of permitted shift starting times measured in time units (if given as int[]) or as time points in the 24h format (if given as float[]).

  • costRoundCoeff (float) – Specifies up to which fraction of the smallest cost the costs are rounded. Rounding can increase the efficiency of the approach significantly.

  • baseTimeInv (int/float) – Number of time intervals per day (if given as int) or approximate length of one time interval in the 24h format (if given as float).

  • ignoreRandomFlow (bool) – Indicates whether traffic via inadmissibe routes shall be ignored. This traffic noise adds uncertainty to the results but may lead to overall more precise estimates.

  • integer (bool) – If True, the solver applies an integer programming algorithm to solve the optimization problem. Otherwise a greedy rounding scheme based on linear programming will be used (potentially faster but with lower performance guarantee).

  • timeout (float) – Timeout for internal optimization routines (in seconds).

  • perturbation (float) – Perturbation added to make one inspection shift slightly more effective. This is needed for tie breaking only.

  • fancyRounding (bool) – If True a more sophisticated rounding scheme will be applied.

  • full_result (bool) – If True, the optimized variables will be returned in addition to the expected numer of inspected agents under the optimal solution.

  • extended_info (bool) – If True, the covered fraction of the total agent flow and the used inspection locations (according to the optimal solution) will be returned in addition to other results.

  • init_greedy (bool) – If True, the greedy rounding algorithm will be used as initial condition for the integer optimization algorithm. Use in conjunction with integer=True.

  • saveFile (bool) – Whether to save the optimization results to a file.

  • loadFile (boold) – Whetehr the results may be loaded from a file if available.

  • fileNameAddition (str) – Addition to the generated file name.

prepare_traffic_factor_model()[source]

Prepares the traffic factor model.

This may be necessary if derived covariates shall be used and these derived covariates do not depend on paramters that shall be fitted.

preprocess_survey_data(redo=False)[source]

Takes the raw survey data and preprocesses them for the model fit.

Parameters

redo (bool) – Whether the task shall be repeated if it had been done before. If set to True, the earlier result be ignored.

read_destination_data(fileNameDestinations)[source]

Reads and saves data that can be used to determine the attractiveness of destinations in the vector traffic model.

Parameters

fileNameDestinations (str) –

Name of a csv file with (ignored) header and columns separated by ,. The following columns must be present in the specified order

Field

Type

Description

Destination ID

IDTYPE

ID of the destination. Must be coinciding with the respective ID used in the road network

DESTINATION_COVARIATES

Columns with the information and types specified in the TrafficFactorModel class. See BaseTrafficFactorModel.DESTINATION_COVARIATES

read_origin_data(fileNameOrigins)[source]

Reads and saves data that can be used to determine the repulsiveness of origins in the vector traffic model.

Parameters

fileNameOrigins (str) –

Name of a csv file with (ignored) header and columns separated by ,. The following columns must be present in the specified order

Field

Type

Description

Origin ID

IDTYPE

ID of the origin. Must be coinciding with the respective ID used in the road network

ORIGIN_COVARIATES

Columns with the information and types specified in the TrafficFactorModel class. See BaseTrafficFactorModel.ORIGIN_COVARIATES

read_postal_code_area_data(fileNamePostalCodeAreas)[source]

Reads and saves data on postal code regions.

Creates and saves an array with postal code area center vertex ID, postal code, and population

Parameters

fileNamePostalCodeAreas (str) –

Name of a csv file with (ignored) header and columns separated by ,. The following columns must be present in the specified order:

Field

Type

Description

Postal code

IDTYPE

ID of the postal code area

Vertex ID

IDTYPE

ID of a vertex representing the postal code area (e.g. a vertex at the centre or population centre)

Population

int

Population living in the postal code area. Can be the actual population count or the number in hundrets, thousands, etc. the results just have to be interpreted accordingly

read_survey_data(fileNameObservations, pruneStartTime=11, pruneEndTime=16, properDataRate=None)[source]

Reads the survey observation data.

Parameters
  • fileNameObservations (str) –

    Name of a csv file containing the road network. The file must be a have a header (will be ignored) and the following columns, separated by ,:

    Field

    Type

    Description

    Station ID

    IDTYPE

    ID of the survey location

    Day ID

    IDTYPE

    ID for the day of the survey (e.g. the date)

    Shift start

    [0, 24), optional

    Start time of the survey shift

    Shift end

    [0, 24), optional

    End time of the survey shift

    Time

    [0, 24), optional

    Time when the agent was observed

    From ID

    IDTYPE, optional

    ID of the origin of the agent

    To ID

    IDTYPE, optional

    ID of the destination of the agent

    Relevant

    bool

    Whether or not this agent is a potential vector

    The times must be given in the 24h format. For example, 2:30PM translates to 14.5.

    Missing or inconsistent data will either be ignored (if relevant==False) or incorporated as ‘unknown’ (if relevant==True). All applicable data will be used to fit the temporal traffic distribution. If a survey shift has been conducted without any agent being observed, include at least one observation with origin and destination left blank and relevant set to False.

  • pruneStartTime (float) – Some parts of the extended model analysis require that only data collected within the same time frame are considered. pruneStartTime gives the start time of this time frame. It should be chosen so that many survey shifts include the entire time interval [pruneStartTime, pruneEndTime].

  • pruneEndTime (float) – End of the unified time frame (see pruneStartTime).

  • properDataRate (float) – Fraction of agents providing inconsistent, incomplete, or wrong data. I not given, the rate will be estimated from the data.

save(fileName=None)[source]

Saves the model to the file fileName.vmm

Parameters

fileName (str) – File name (without extension). If None, the model’s default file name will be used.

save_model_predictions(fileName=None)[source]

Computes and saves model predictions by origin, destination, origin-destination pair, and inspection location.

Saves the estimated mean traffic and

Parameters

fileName (str) – Base of the file name to which the predictions shall be saved.

save_simulated_observations(parameters=None, parametersConsidered=None, shiftNumber=None, dayNumber=None, stationSets=None, fileName=None)[source]

Simulate observation data that would be obtained if the model were True.

Parameters
  • parameters (float[]) – Parameters for the traffic flow (gravity) model.

  • parametersConsidered (bool[]) – Which parameters are free? Is True at the entries corresponding to the parameters that are free. parametersConsidered must have exactly as many True entries as the length of dynamicParameters

  • shiftNumber (int) – Number of observation shifts to be considered.

  • dayNumber (int) – Number of days on which the shifts were conducted.

  • stationSets (int[][]) – Sets/lists of survey location IDs at which inspections could be conducted simultaneously.

  • fileName (str) – Name of the file to which the generated observations shall be saved.

set_compliance_rate(complianceRate)[source]

Sets the boaters’ compliance rate (for stopping at inspection/survey locations)

The rate is used for both fitting the model and optimizing inspection station operation

Parameters

complianceRate (float) – Proportion of agents stopping at survey/inspection stations.

set_infested(originID, infested=True)[source]

Chenges the infestation state of an origin with the given ID.

Parameters
  • originID (IDTYPE) – ID of the origin whose state shall be changed

  • infested (bool) – Infestation state. True means infested.

set_origins_considered(considered=None, infested=None)[source]

Determines which origins are considered in the model fit.

It can happen that the model is to be fitted to model vectors coming from specific origins, e.g. a model for long-distance traffic and a model for intermediate-distance traffic. In this case, considered can be used to specify the origins considered to fit the model.

If infested is given and considered is NOT specified, then the model will be fitted to the origins with the given infestation status. E.g. infested=True will result in the model being fitted to estimate traffic from infested jursidictions only. All other data will be ignored.

Parameters
  • considered (bool[]) – Array determining which of the sources are considered

  • infested (bool) – Select considered sources based on the infestation status

set_traffic_factor_model_class(trafficFactorModel_class=None)[source]

Sets the class representing the traffic factor (gravity) model.

Parameters

trafficFactorModel_class (class) – Class of the traffic factor model. Must inherit from BaseTrafficFactorModel.

simulate_count_data(stationTimes, day, parameters, parametersConsidered, limitToOneObservation=False)[source]

Simulate observation data that would be obtained one one day if the model were True.

For each boater, start, end, and path are returned.

Parameters
  • stationTimes (dict) – Keys: indices of the survey locations; values: 2-tuples for start and end time of the survey shift at that location (in 24h format).

  • day (int) – ID for that day.

  • parameters (float[]) – Parameters for the traffic flow (gravity) model.

  • parametersConsidered (bool[]) – Which parameters are free? Is True at the entries corresponding to the parameters that are free. parametersConsidered must have exactly as many True entries as the length of dynamicParameters

  • limitToOneObservation (bool) – Whether an agent travelling on an inadmissible route can be observed at one location only (as assumed when we fit the route choice model) or at multiple locations.

test_1_1_regression(minSampleSize=20, saveFileName=None, comparisonFileName=None)[source]

Tests whether the model results are biased.

Compares predicted and observed values and tests the null hypothesis that the model yields unbiased estimates. If we obtain a high p-value and are unable to reject this hypothesis, we may assume that the model dies a good job.

Parameters
  • saveFileName (str) – File name for a plot depicting the test and to save the results.

  • comparisonFileName (str) – File name to load results from a different model or different data set for comparison.

IDTYPE = '|S11'

Type of IDs of origins and destinations. Alpha-numerical code with at most 9 digets.

2 digets remain reserved for internal use.

class TransportNetwork(fileNameEdges, fileNameVertices, destinationToDestination=False, preprocessingArgs=None, edgeLengthRandomization=0.001, **printerArgs)[source]

Bases: lopaths.graph.FlowPointGraph

A graph representation of the road network.

In contrast to the general graph class lopaths.graph.FlowPointGraph, TransportNetwork implements invasion-specific functionalities needed for the vector movement model

Parameters
  • fileNameEdges (str) –

    Name of a csv file containing the road network. The file must be a csv with header and the following columns, separated by ,:

    Field

    Type

    Description

    Road ID

    IDTYPE

    ID of the road section

    Vertex from-ID

    IDTYPE

    Starting vertex of the road section

    Vertex to-ID

    IDTYPE

    End vertex of the road section

    Lenght

    float

    Length (or travel time) of the road section

    Survey location for forward traffic

    IDTYPE, optional

    ID of the location where forward traffic can be surveyed

    Survey location for forward traffic

    IDTYPE, optional

    ID of the station where backward traffic can be surveyed

    Survey location for forward and backward traffic

    IDTYPE, optional

    ID of the station where forward and backward traffic can be surveyed

    Destination ID

    IDTYPE, optional

    ID of the destination that can be accessed via this road section

  • fileNameVertices (str) –

    Name of a csv file stating which vertices are origins and destinations. The file must be a csv with header and the following columns, separated by ,:

    Field

    Type

    Description

    Vertex ID

    IDTYPE

    ID of the vertex

    Potential via vertex

    bool

    whether the vertex could be a potential intermediate destination for boaters (should be True by default, but can be set to False for many vertices to reduce computational complexity)

    Vertex type

    int

    type identifier for the vertex; see below
    • 1: origin

    • 2: destination

    • 3: postal code area center (used to determine whether destinations are located in populated areas)

    • other: no specific role for the vertex

  • destinationToDestination (bool) – True iff destination to destination traffic is modelled, i.e. if the sets of origins and destinations are equal. Note: a gravity model for destination to destination traffic is not yet implemented, but could be added easily.

  • preprocessingArgs (tuple) – Arguments for preprocessing of the road network. Refer to lopaths.graph.FlowPointGraph for further documentation

  • edgeLengthRandomization (float) – Maximum random perturbation to road lengths. Needed to make it likely that distinct paths have distinct length.

  • printerArgs – Arguments passed to vemomoto_core.tools.hrprint.HierarchichalPrinter

Methods:

find_potential_routes([stretchConstant, …])

Searches potential routes of boaters.

find_shortest_distances()

Determines the shortest distances between all origins and destinations and from all postal code centres to the destinations

preprocessing([preprocessingArgs])

Preprocesses the graph to make path search queries more efficient.

update_sources_considered([rawConsidered, …])

Changes which origins are considered to fit the model.

find_potential_routes(stretchConstant=1.5, localOptimalityConstant=0.2, acceptionFactor=0.667, rejectionFactor=1.333)[source]

Searches potential routes of boaters.

For detailed documentation on the arguments, refer to lopaths.graph.FlowPointGraph.find_locally_optimal_paths

Parameters
  • stretchConstant (>=1) – Maximal length of the admissible paths in relation to shortest path. Controls the length of the admissible paths. 1 refers to shortest paths only, infinity to all paths.

  • localOptimalityConstant ([0, 1]) – Fraction of the path that must be optimal. Controls how optimal the admissible paths shall be. 0 refers to all paths, 1 refers to shortest paths only.

  • acceptionFactor ((0,1]) – Relaxation factor for local optimality constraint. Approximation factor to speed up the local optimality check. 0 accepts all paths, 1 performs an exact check. Choose the largest feasible value. 1 is often possible.

  • rejectionFactor ([1,2]) – False rejection factor for local optimality constraint. Approximation factor to speed up the local optimality check. 1 performs exact checks, 2 may reject paths that are admissible but not locally optimal twice as much as required. Choose the smallest feasible value. 1 is often possible.

find_shortest_distances()[source]

Determines the shortest distances between all origins and destinations and from all postal code centres to the destinations

preprocessing(preprocessingArgs=None)[source]

Preprocesses the graph to make path search queries more efficient.

This step is a necessary prerequisite to all path queries as implemented here.

The ‘reach’ of each vertex is computed. The ‘reach’ is high if a vertex is at the center of a long shortest path (e.g. a highway).

Parameters

preprocessingArgs (dict or iterable) – Contains the arguments for lopaths.graph.FlowPointGraph.preprocessing. If None, reasonable default arguments will be chosen. Refer to lopaths.graph.FlowPointGraph.preprocessing for a list of the possible arguments and their meaning

update_sources_considered(rawConsidered=None, considered=None)[source]

Changes which origins are considered to fit the model.

If long-distance traffic is assumed to follow different mechanisms than intermediate-distance traffic, it can be beneficial to fit two distinct models for traffic from different origins. update_sources_considered determines which of the origins are considered to fit the model.

Parameters
  • rawConsidered (bool[]) – Boolean array determining which of the sources are considered. Must have the same size as the number of sources.

  • considered (bool[]) – Boolean array determining which of the currently considered sources remain considered. Must have the same size as the number of currently considered sources

create_distribution_plot(X, observed, predicted=None, best=None, yLabel='PMF', title='', fileName=None)[source]

Creates a plot of a given discrete distribution.

Parameters
  • X (float[]) – Values that could be observed.

  • observed (float[]) – Observed cumulative density (non-parametric estimate of the cumulative mass function).

  • predicted (float[]) – Predicted cumulative density.

  • best (float[]) – Second predicted cumulative density (different prediction method).

  • yLabel (str) – Label of the y axis.

  • title (str) – Title of the plot.

  • fileName (str) – Name of the file that the plot shall be saved to.

create_observed_predicted_mean_error_plot(predicted, observed, error=None, constError=None, errorFunctions=None, regressionResult=None, labels=None, title='', saveFileName=None, comparisonFileName=None, comparison=None, comparisonPredicted=None, comparisonObserved=None, logScale=False)[source]

Create an observed vs. predicted plot.

Parameters
  • predicted (float[]) – Array of predicted values.

  • observed (float[]) – Array of observed values.

  • error (float[]) – Array with a measure for the expected error of the predictions. This can be used to show how large the expected deviations between predicted and observed values are.

  • constError (float) – If given, the area constError units above and below the predicted=observed line will be shaded. This is useful if the figure shall show the confidence interval of predictions.

  • errorFunctions (callable[]) – Two methods for the lower and the upper predicted error (e.g. 95% confidence interval). Both of these methods must take the predicted value and return the repsective expected bound for the observations. If given, the area between these function will be plotted as a shaded area.

  • regressionResult (float[]) – Slope and intercept of an observed vs. prediction regression. If given, the regression line will be plotted.

  • labels (str[]) – Labels for the data points.

  • title (str) – Title of the plot

  • saveFileName (str) –

    Name of the file where the plot and the data used to generate it will be saved.

    ~+~

    Note

    Note that only predicted and observed values will be saved.

  • comparisonFileName (str) – Name of the file where alternative results are saved. These results will be loaded and plotted for comparison.

  • comparisonPredicted (float[]) – Predictions plotted for comparison.

  • comparisonObserved (float[]) – Observations plotted for comparison.

  • logScale (bool) – Whether to plot on a log-log scale.

create_observed_predicted_mean_error_plot_from_files(fileName1, fileName2, extension='', **kwargs)[source]

Creates an observed vs. predicted plot from saved data.

Parameters
  • fileName1 (str) – Name of the file from which the primary data shall be loaded.

  • fileName2 (str) – Name of the file from which comparison data shall be loaded.

  • extension (str) – Extension to add to both file names.

  • **kwargs (kwargs) – Keyword arguments passed to create_observed_predicted_mean_error_plot().

mean_relative_absolute_error(prediction, observation, normalization=None, cutoff=None)[source]

Compute the mean relative absolute error between predictions and observations.

Parameters
  • prediction (float[]) – Predicted values.

  • observation (float[]) – Observed values.

  • normalization (float[]) – Normalization for the error. If not given, the predicted values will be used.

  • cutoff (float | float[]) – Threshold value. Observed-predicted pairs will be ignored if the predicted value is below cutoff.

nbinom_fit(data)[source]

Fits a negative binomial distribution to given data

redraw_predicted_observed(fileName1, fileName2)[source]

Redraws predicted versus oberved plots generated earlier.

Parameters
  • fileName1 (str) – Name of the file from which the primary data shall be loaded.

  • fileName2 (str) – Name of the file from which comparison data shall be loaded.

safe_delattr(obj, attrname)[source]

Deletes and attribute if it exists