sbws.lib package

Submodules

sbws.lib.circuitbuilder module

class sbws.lib.circuitbuilder.CircuitBuilder(args, conf, controller, relay_list=None, close_circuits_on_exit=True)[source]

Bases: object

The CircuitBuilder interface.

Subclasses must implement their own build_circuit() function. Subclasses may keep additional state if they’d find it helpful.

The primary way to use a CircuitBuilder of any type is to simply create it and then call cb.build_circuit(…) with any options that your CircuitBuilder type needs.

It might be good practice to close circuits as you find you no longer need them, but CircuitBuilder will keep track of existing circuits and close them when it is deleted.

close_circuit(circ_id)[source]
class sbws.lib.circuitbuilder.GapsCircuitBuilder(*a, **kw)[source]

Bases: sbws.lib.circuitbuilder.CircuitBuilder

Same as CircuitBuilder but implements build_circuit.

build_circuit(path)[source]

Return parent class build circuit method.

Since sbws is only building 2 hop paths, there is no need to add random relays to the path, or convert back and forth between fingerprint and Relay objects.

sbws.lib.circuitbuilder.valid_circuit_length(path)[source]

sbws.lib.relaylist module

class sbws.lib.relaylist.Relay(fp, cont, ns=None, desc=None, timestamp=None)[source]

Bases: object

address
average_bandwidth
burst_bandwidth
can_exit_to_port(port, strict=False)[source]

Returns True if the relay has an exit policy and the policy accepts exiting to the given port or False otherwise.

If strict is true, it only returns the exits that can exit to all IPs and that port.

The exits that are IPv6 only or IPv4 but rejecting some public networks will return false. On July 2020, there were 67 out of 1095 exits like this.

If strict is false, it returns any exit that can exit to some public IPs and that port.

Note that the EXIT flag exists when the relay can exit to 443 and 80. Currently all Web servers are using 443, so it would not be needed to check the EXIT flag too, using this function.

consensus_bandwidth

Return the consensus bandwidth in Bytes.

Consensus bandwidth is the only bandwidth value that is in kilobytes.

consensus_bandwidth_is_unmeasured
consensus_valid_after

Obtain the consensus Valid-After from the document of this relay network status.

exit_policy
fingerprint
flags
increment_relay_recent_measurement_attempt()[source]

Increment The number of times that a relay has been queued to be measured.

It is call from main_loop().

increment_relay_recent_priority_list()[source]

The number of times that a relay is “prioritized” to be measured.

It is call from best_priority().

is_exit_not_bad_allowing_port(port, strict=False)[source]
last_consensus_timestamp
master_key_ed25519

Obtain ed25519 master key of the relay in server descriptors.

Returns:str, the ed25519 master key base 64 encoded without trailing ‘=’s.
nickname
observed_bandwidth
relay_in_recent_consensus_count

Number of times the relay was in a conensus.

relay_recent_measurement_attempt_count
relay_recent_priority_list_count
update_relay_in_recent_consensus(timestamp=None)[source]
update_router_status(router_status)[source]

Update this relay router status (from the consensus).

update_server_descriptor(server_descriptor)[source]

Update this relay server descriptor (from the consensus.

class sbws.lib.relaylist.RelayList(args, conf, controller, measurements_period=432000, state=None)[source]

Bases: object

Keeps a list of all relays in the current Tor network and updates it transparently in the background. Provides useful interfaces for getting only relays of a certain type.

authorities
bad_exits
exit_min_bw()[source]
exits
exits_not_bad_allowing_port(port, strict=False)[source]
fast
guards
increment_recent_measurement_attempt()[source]

Increment the number of times that any relay has been queued to be measured.

It is call from main_loop().

It is read and stored in a state file.

last_consensus_timestamp

Returns the datetime when the last consensus was obtained.

non_exit_min_bw()[source]
non_exits
random_relay()[source]
recent_consensus_count

Number of times a new consensus was obtained.

recent_measurement_attempt_count
relays
relays_fingerprints
sbws.lib.relaylist.valid_after_from_network_statuses(network_statuses)[source]

Obtain the consensus Valid-After datetime from the document attribute of a stem.descriptor.RouterStatusEntryV3.

Parameters:network_statuses (list) –

returns datetime:

sbws.lib.relayprioritizer module

class sbws.lib.relayprioritizer.RelayPrioritizer(args, conf, relay_list, result_dump)[source]

Bases: object

best_priority(prioritize_result_error=False, return_fraction=True)[source]

Yields a new ordered list of relays to be measured next.

The relays that were measured farther away in the past, get prioritized (lowest priority number, first in the list). The relays that were measured more recently get lower priority (last in the list, higher priority number).

Optionally, the relays which measurements failed can be prioritized (first in the list). However, unstable relays that fail often to be measured, might fail again and stable relays will get measured only when their measurements become old enough. The opposite might be more suitable: give lower priority to the relays that are unstable, to don’t spend time measuring relays that might fail to be measured.

Optionally, return only a fraction of all the relays in the network. Since there could be new relays in the network while measuring the list of relays returned by this method, this method is run again before all the relays in the network are measured.

Note

In a future refactor, instead of having a static fraction of relays to be measured, this method could be call when it’s known that there’re X number of new relays in the network.

Since measurements made before than X days ago (too old) are not considered, and the initial list of past measurements is only filtered when the scanner starts, it’s needed to filter here again to discard those measurements.

Parameters:
  • prioritize_result_error (bool) – whether prioritize or not measurements that did not succed.
  • return_fraction (bool) – whether to return only a fraction of the relays seen in the network or return all.

return: a generator of the new ordered list of relays to measure next.

increment_recent_priority_list()[source]

Increment the number of times that best_priority() has been run.

increment_recent_priority_relay(relays_count)[source]

Increment the number of relays that have been “prioritized” to be measured in a best_priority().

recent_priority_list_count
recent_priority_relay_count

sbws.lib.resultdump module

class sbws.lib.resultdump.Result(relay, circ, dest_url, scanner_nick, t=None)[source]

Bases: object

A bandwidth measurement for a relay.

It re-implements Relay as a inner class.

class Relay(fingerprint, nickname, address, master_key_ed25519, average_bandwidth=None, burst_bandwidth=None, observed_bandwidth=None, consensus_bandwidth=None, consensus_bandwidth_is_unmeasured=None, relay_in_recent_consensus=None, relay_recent_measurement_attempt=None, relay_recent_priority_list=None)[source]

Bases: object

A Tor relay.

It re-implements Relay with the attributes needed.

Note

in a future refactor it would be simpler if a Relay has measurements and a measurement has a relay, instead of every measurement re-implementing Relay.

address
circ
consensus_bandwidth
consensus_bandwidth_is_unmeasured
dest_url
fingerprint
static from_dict(d)[source]

Returns a Result subclass from a dictionary.

Returns None if the version attribute is not RESULT_VERSION

It raises NotImplementedError when the dictionary type can not be parsed.

Note

in a future refactor, the conversions to/from object-dictionary will be simpler using setattr and __dict__

version is not being used and should be removed.

master_key_ed25519
nickname
relay_average_bandwidth
relay_burst_bandwidth
relay_in_recent_consensus

Number of times the relay was in a consensus.

relay_observed_bandwidth
relay_recent_measurement_attempt

Returns the relay recent measurements attemps.

It is initialized in Relay and incremented in main_loop().

relay_recent_priority_list

Returns the relay recent “prioritization”s to be measured.

It is initialized in Relay and incremented in main_loop().

scanner
time
to_dict()[source]
type
version
class sbws.lib.resultdump.ResultDump(args, conf)[source]

Bases: object

Runs the enter() method in a new thread and collects new Results on its queue. Writes them to daily result files in the data directory

enter()[source]

Main loop for the ResultDump thread.

When there are results in the queue, queue.get will get them until there are not anymore or timeout happen.

For every result it gets, it process it and store in the filesystem, which takes ~1 millisecond and will not trigger the timeout. It can then store in the filesystem ~1000 results per second.

I does not accept any other data type than Results or list of Results, therefore is not possible to put big data types in the queue.

If there are not any results in the queue, it waits 1 second and checks again.

handle_result(result)[source]

Call from ResultDump thread. If we are shutting down, ignores ResultError* types

results_for_relay(relay)[source]
store_result(result)[source]

Call from ResultDump thread

class sbws.lib.resultdump.ResultError(*a, msg=None, **kw)[source]

Bases: sbws.lib.resultdump.Result

freshness_reduction_factor

When the RelayPrioritizer encounters this Result, how much should it adjust its freshness? (See RelayPrioritizer.best_priority() for more information about “freshness”)

A higher factor makes the freshness lower (making the Result seem older). A lower freshness leads to the relay having better priority, and better priority means it will be measured again sooner.

The value 0.5 was chosen somewhat arbitrarily, but a few weeks of live network testing verifies that sbws is still able to perform useful measurements in a reasonable amount of time.

static from_dict(d)[source]

Returns a Result subclass from a dictionary.

Returns None if the version attribute is not RESULT_VERSION

It raises NotImplementedError when the dictionary type can not be parsed.

Note

in a future refactor, the conversions to/from object-dictionary will be simpler using setattr and __dict__

version is not being used and should be removed.

msg
to_dict()[source]
type
class sbws.lib.resultdump.ResultErrorAuth(*a, **kw)[source]

Bases: sbws.lib.resultdump.ResultError

freshness_reduction_factor

Override the default ResultError.freshness_reduction_factor because a ResultErrorAuth is most likely not the measured relay’s fault, so we shouldn’t hurt its priority as much. A higher reduction factor means a Result’s effective freshness is reduced more, which makes the relay’s priority better.

The value 0.9 was chosen somewhat arbitrarily.

static from_dict(d)[source]

Returns a Result subclass from a dictionary.

Returns None if the version attribute is not RESULT_VERSION

It raises NotImplementedError when the dictionary type can not be parsed.

Note

in a future refactor, the conversions to/from object-dictionary will be simpler using setattr and __dict__

version is not being used and should be removed.

to_dict()[source]
type
class sbws.lib.resultdump.ResultErrorCircuit(*a, **kw)[source]

Bases: sbws.lib.resultdump.ResultError

freshness_reduction_factor

There are a few instances when it isn’t the relay’s fault that the circuit failed to get built. Maybe someday we’ll try detecting whose fault it most likely was and subclassing ResultErrorCircuit. But for now we don’t. So reduce the freshness slightly more than ResultError does by default so priority isn’t hurt quite as much.

A (hopefully very very rare) example of when a circuit would fail to get built is when the sbws client machine suddenly loses Internet access.

static from_dict(d)[source]

Returns a Result subclass from a dictionary.

Returns None if the version attribute is not RESULT_VERSION

It raises NotImplementedError when the dictionary type can not be parsed.

Note

in a future refactor, the conversions to/from object-dictionary will be simpler using setattr and __dict__

version is not being used and should be removed.

to_dict()[source]
type
class sbws.lib.resultdump.ResultErrorDestination(*a, **kw)[source]

Bases: sbws.lib.resultdump.ResultError

Error when there is not a working destination Web Server.

It is instanciated in measure_relay().

Note

this duplicates code and add more tech-debt, since it’s the same as the other ResultError classes except for the type. In a future refactor, there should be only one ResultError class and assign the type in the scanner module.

static from_dict(d)[source]

Returns a Result subclass from a dictionary.

Returns None if the version attribute is not RESULT_VERSION

It raises NotImplementedError when the dictionary type can not be parsed.

Note

in a future refactor, the conversions to/from object-dictionary will be simpler using setattr and __dict__

version is not being used and should be removed.

to_dict()[source]
type
class sbws.lib.resultdump.ResultErrorSecondRelay(*a, **kw)[source]

Bases: sbws.lib.resultdump.ResultError

Error when it could not be found a second relay suitable to measure a relay.

A second suitable relay is a relay that: - Has at least equal bandwidth as the relay to measure. - If the relay to measure is not an exit, the second relay is an exit without bad flag and can exit to port 443. - If the relay to measure is an exit, the second relay is not an exit.

It is instanciated in measure_relay().

Note

this duplicates code and add more tech-debt, since it’s the same as the other ResultError classes except for the type. In a future refactor, there should be only one ResultError class and assign the type in the scanner module.

static from_dict(d)[source]

Returns a Result subclass from a dictionary.

Returns None if the version attribute is not RESULT_VERSION

It raises NotImplementedError when the dictionary type can not be parsed.

Note

in a future refactor, the conversions to/from object-dictionary will be simpler using setattr and __dict__

version is not being used and should be removed.

to_dict()[source]
type
class sbws.lib.resultdump.ResultErrorStream(*a, **kw)[source]

Bases: sbws.lib.resultdump.ResultError

static from_dict(d)[source]

Returns a Result subclass from a dictionary.

Returns None if the version attribute is not RESULT_VERSION

It raises NotImplementedError when the dictionary type can not be parsed.

Note

in a future refactor, the conversions to/from object-dictionary will be simpler using setattr and __dict__

version is not being used and should be removed.

to_dict()[source]
type
class sbws.lib.resultdump.ResultSuccess(rtts, downloads, *a, **kw)[source]

Bases: sbws.lib.resultdump.Result

downloads
static from_dict(d)[source]

Returns a Result subclass from a dictionary.

Returns None if the version attribute is not RESULT_VERSION

It raises NotImplementedError when the dictionary type can not be parsed.

Note

in a future refactor, the conversions to/from object-dictionary will be simpler using setattr and __dict__

version is not being used and should be removed.

rtts
to_dict()[source]
type
sbws.lib.resultdump.load_recent_results_in_datadir(fresh_days, datadir, success_only=False, on_changed_ipv4=False, on_changed_ipv6=False)[source]

Given a data directory, read all results files in it that could have results in them that are still valid. Trim them, and return the valid Results as a list

sbws.lib.resultdump.load_result_file(fname, success_only=False)[source]

Reads in all lines from the given file, and parses them into Result structures (or subclasses of Result). Optionally only keeps ResultSuccess. Returns all kept Results as a result dictionary. This function does not care about the age of the results

sbws.lib.resultdump.merge_result_dicts(d1, d2)[source]

Given two dictionaries that contain Result data, merge them. Result dictionaries have keys of relay fingerprints and values of lists of results for those relays.

sbws.lib.resultdump.trim_results(fresh_days, result_dict)[source]

Given a result dictionary, remove all Results that are no longer valid and return the new dictionary

sbws.lib.resultdump.trim_results_ip_changed(result_dict, on_changed_ipv4=False, on_changed_ipv6=False)[source]

When there are results for the same relay with different IPs, create a new results’ dictionary without that relay’s results using an older IP.

Parameters:
  • result_dict (dict) – a dictionary of results
  • on_changed_ipv4 (bool) – whether to trim the results when a relay’s IPv4 changes
  • on_changed_ipv6 (bool) – whether to trim the results when a relay’s IPv6 changes
Returns:

a new results dictionary

sbws.lib.resultdump.write_result_to_datadir(result, datadir)[source]

Can be called from any thread

sbws.lib.v3bwfile module

Classes and functions that create the bandwidth measurements document (v3bw) used by bandwidth authorities.

class sbws.lib.v3bwfile.V3BWFile(v3bwheader, v3bwlines)[source]

Bases: object

Create a Bandwidth List file following spec version 1.X.X

Parameters:
  • v3bwheader (V3BWHeader) – header
  • v3bwlines (list) – V3BWLines
static bw_kb(bw_lines, reverse=False)[source]
bw_line_for_node_id(node_id)[source]

Returns the bandwidth line for a given node fingerprint.

Used to combine data when plotting.

static bw_sbws_scale(bw_lines, scale_constant=7500, reverse=False)[source]

Return a new V3BwLine list scaled using sbws method.

Parameters:
  • bw_lines (list) – bw lines to scale, not self.bw_lines, since this method will be before self.bw_lines have been initialized.
  • scale_constant (int) – the constant to multiply by the ratio and the bandwidth to obtain the new bandwidth
Returns list:

V3BwLine list

static bw_torflow_scale(bw_lines, desc_bw_obs_type=1, cap=0.05, num_round_dig=2, reverse=False, router_statuses_d=None)[source]

Obtain final bandwidth measurements applying Torflow’s scaling method.

See details in Torflow aggregation and scaling.

classmethod from_results(results, scanner_country=None, destinations_countries=None, state_fpath='', scale_constant=7500, scaling_method=1, torflow_obs=0, torflow_cap=0.05, round_digs=2, secs_recent=None, secs_away=None, min_num=0, consensus_path=None, max_bw_diff_perc=50, reverse=False)[source]

Create V3BWFile class from sbws Results.

Parameters:
  • results (dict) – see below
  • state_fpath (str) – path to the state file
  • scaling_method (int) – Scaling method to obtain the bandwidth Possible values: {None, SBWS_SCALING, TORFLOW_SCALING} = {0, 1, 2}
  • scale_constant (int) – sbws scaling constant
  • torflow_obs (int) – method to choose descriptor observed bandwidth
  • reverse (bool) – whether to sort the bw lines descending or not

Results are in the form:

{'relay_fp1': [Result1, Result2, ...],
 'relay_fp2': [Result1, Result2, ...]}
classmethod from_v100_fpath(fpath)[source]
classmethod from_v1_fpath(fpath)[source]
info_stats
static is_max_bw_diff_perc_reached(bw_lines, max_bw_diff_perc=50, router_statuses_d=None)[source]
is_min_perc
max_bw
mean_bw
static measured_progress_stats(num_bw_lines, number_consensus_relays, min_perc_reached_before)[source]

Statistics about measurements progress, to be included in the header.

Parameters:
  • bw_lines (list) – the bw_lines after scaling and applying filters.
  • consensus_path (str) – the path to the cached consensus file.
  • state_fpath (str) – the path to the state file
Returns dict, bool:
 

Statistics about the progress made with measurements and whether the percentage of measured relays has been reached.

median_bw
min_bw
num
static read_number_consensus_relays(consensus_path)[source]

Read the number of relays in the Network from the cached consensus file.

static read_router_statuses(consensus_path)[source]

Read the router statuses from the cached consensus file.

static set_under_min_report(bw_lines)[source]

Mondify the Bandwidth Lines adding the KeyValue under_min_report, vote.

sum_bw
to_plt(attrs=['bw'], sorted_by=None)[source]

Return bandwidth data in a format useful for matplotlib.

Used from external tool to plot.

update_progress(num_bw_lines, header, number_consensus_relays, state)[source]

Returns True if the minimim percent of Bandwidth Lines was reached and False otherwise. Update the header with the progress.

static warn_if_not_accurate_enough(bw_lines, scale_constant=7500)[source]
write(output)[source]
class sbws.lib.v3bwfile.V3BWHeader(timestamp, **kwargs)[source]

Bases: object

Create a bandwidth measurements (V3bw) header following bandwidth measurements document spec version 1.X.X.

Parameters:
  • timestamp (str) – timestamp in Unix Epoch seconds of the most recent generator result.
  • version (str) – the spec version
  • software (str) – the name of the software that generates this
  • software_version (str) – the version of the software
  • kwargs (dict) –

    extra headers. Currently supported:

    • earliest_bandwidth: str, ISO 8601 timestamp in UTC time zone when the first bandwidth was obtained
    • generator_started: str, ISO 8601 timestamp in UTC time zone when the generator started
add_relays_excluded_counters(exclusion_dict)[source]

Add the monitoring KeyValues to the header about the number of relays not included because they were not eligible.

add_stats(**kwargs)[source]
add_time_report_half_network()[source]

Add to the header the time it took to measure half of the network.

It is not the time the scanner actually takes on measuring all the network, but the number_eligible_relays that are reported in the bandwidth file and directory authorities will vote on.

This is calculated for half of the network, so that failed or not reported relays do not affect too much.

For instance, if there are 6500 relays in the network, half of the network would be 3250. And if there were 4000 eligible relays measured in an interval of 3 days, the time to measure half of the network would be 3 days * 3250 / 4000.

Since the elapsed time is calculated from the earliest and the latest measurement and a relay might have more than 2 measurements, this would give an estimate on how long it would take to measure the network including all the valid measurements.

Log also an estimated on how long it would take with the current number of relays included in the bandwidth file.

static consensus_count_from_file(state_fpath)[source]
static earliest_bandwidth_from_results(results)[source]
classmethod from_lines_v1(lines)[source]
Parameters:lines (list) – list of lines to parse
Returns:tuple of V3BWHeader object and non-header lines
classmethod from_lines_v100(lines)[source]
Parameters:lines (list) – list of lines to parse
Returns:tuple of V3BWHeader object and non-header lines
classmethod from_results(results, scanner_country=None, destinations_countries=None, state_fpath='')[source]
classmethod from_text_v1(text)[source]
Parameters:text (str) – text to parse
Returns:tuple of V3BWHeader object and non-header lines
static generator_started_from_file(state_fpath)[source]

ISO formatted timestamp for the time when the scanner process most recently started.

keyvalue_tuple_ls

Return list of all KeyValue tuples

keyvalue_unordered_tuple_ls

Return list of KeyValue tuples that do not have specific order.

keyvalue_v1str_ls

Return KeyValue list of strings following spec v1.X.X.

keyvalue_v2_ls

Return KeyValue list of strings following spec v2.X.X.

static latest_bandwidth_from_results(results)[source]
num_lines
static recent_measurement_attempt_count_from_file(state_fpath)[source]

Returns the number of times any relay was queued to be measured in the recent (by default 5) days from the state file.

static recent_priority_list_count_from_file(state_fpath)[source]

Returns the number of times best_priority() was run in the recent (by default 5) days from the state file.

static recent_priority_relay_count_from_file(state_fpath)[source]

Returns the number of times any relay was “prioritized” to be measured in the recent (by default 5) days from the state file.

strv1

Return header string following spec v1.X.X.

strv2

Return header string following spec v2.X.X.

class sbws.lib.v3bwfile.V3BWLine(node_id, bw, **kwargs)[source]

Bases: object

Create a Bandwidth List line following the spec version 1.X.X.

Parameters:
  • node_id (str) – the relay fingerprint
  • bw (int) – the bandwidth value that directory authorities will include in their votes.
  • kwargs (dict) – extra headers.

Note

tech-debt: move node_id and bw to kwargs and just ensure that the required values are in **kwargs

bw_keyvalue_tuple_ls

Return list of KeyValue Bandwidth Line tuples.

bw_keyvalue_v1str_ls

Return list of KeyValue Bandwidth Line strings following spec v1.X.X.

static bw_mean_from_results(results)[source]
static bw_median_from_results(results)[source]
bw_strv1

Return Bandwidth Line string following spec v1.X.X.

static consensus_bandwidth_from_results(results)[source]

Obtain the last consensus bandwidth from the results.

static consensus_bandwidth_is_unmeasured_from_results(results)[source]

Obtain the last consensus unmeasured flag from the results.

del_relay_type()[source]
static desc_bw_avg_from_results(results)[source]

Obtain the last descriptor bandwidth average from the results.

static desc_bw_bur_from_results(results)[source]

Obtain the last descriptor bandwidth burst from the results.

static desc_bw_obs_last_from_results(results)[source]
static desc_bw_obs_mean_from_results(results)[source]
classmethod from_bw_line_v1(line)[source]
classmethod from_data(data, fingerprint)[source]
classmethod from_results(results, secs_recent=None, secs_away=None, min_num=0, router_statuses_d=None)[source]

Convert sbws results to relays’ Bandwidth Lines

bs stands for Bytes/seconds bw_mean means the bw is obtained from the mean of the all the downloads’ bandwidth. Downloads’ bandwidth are calculated as the amount of data received divided by the the time it took to received. bw = data (Bytes) / time (seconds)

static last_time_from_results(results)[source]
static result_types_from_results(results)[source]
static results_away_each_other(results, secs_away=None)[source]
static results_recent_than(results, secs_recent=None)[source]
static rtt_from_results(results)[source]
set_relay_type(relay_type)[source]
sbws.lib.v3bwfile.kb_round_x_sig_dig(bw_bs, digits=2)[source]

Convert bw_bs from bytes to kilobytes, and round the result to ‘digits’ significant digits. Results less than or equal to 1 are rounded up to 1. Returns an integer.

digits must be greater than 0. n must be less than or equal to 2**82, to avoid floating point errors.

sbws.lib.v3bwfile.num_results_of_type(results, type_str)[source]
sbws.lib.v3bwfile.result_type_to_key(type_str)[source]
sbws.lib.v3bwfile.round_sig_dig(n, digits=2)[source]

Round n to ‘digits’ significant digits in front of the decimal point. Results less than or equal to 1 are rounded to 1. Returns an integer.

digits must be greater than 0. n must be less than or equal to 2**73, to avoid floating point errors.

Module contents