This is a raw dump of brainstormery had during a hacksession with Threebean.
$ sudo yum install python-fedmsg-meta-fedora-infrastructure
$ hub clone ralphbean/fedora-stats-tools
Though this was only about 90 minutes of cycling, it is the part that is burned most into my brain. This metric is all about Helping identify how "flat" the message distributions are, to avoid uneven burnout mode... aka, take the agent that is generating the most messages within a time frame (the "Head"), and the agent generating the least number of messages in that timeframe(the "Tail"), and come up with a line drawn between them. The more "flat" that line is, the more even the number of generated messages is amongst all contributors. Still unclear? Me too ;) Here's some python instead:
import collections
import json
import pprint
import time
import requests
import fedmsg.config
import fedmsg.meta
config = fedmsg.config.load_config()
fedmsg.meta.make_processors(**config)
start = time.time()
one_day = 1 * 24 * 60 * 60
whole_range = one_day
N = 50
def get_page(page, end, delta):
url = 'https://apps.fedoraproject.org/datagrepper/raw'
response = requests.get(url, params=dict(
delta=delta,
page=page,
end=end,
rows_per_page=100,
))
data = response.json()
return data
results = {}
now = time.time()
for iteration, end in enumerate(range(*map(int, (now - whole_range, now, whole_range / N)))):
results[end] = collections.defaultdict(int)
data = get_page(1, end, whole_range)
pages = data['pages']
for page in range(1, pages + 1):
print "* (", iteration, ") getting page", page, "of", data['pages'], "with end", end, "and delta", whole_range
data = get_page(page, end, whole_range)
messages = data['raw_messages']
for message in messages:
users = fedmsg.meta.msg2usernames(message, **config)
for user in users:
results[end][user] += 1
#pprint.pprint(dict(results))
with open('foo.json', 'w') as f:
f.write(json.dumps(results))
import json
comparator = lambda item: item[1]
with open('foo.json', 'r') as f:
all_data = json.loads(f.read())
for timestamp, data in all_data.items():
for username, value in data.items():
all_data[timestamp][username] = float(value)
timestamp_getter = lambda item: item[0]
sorted_data = sorted(all_data.items(), key=timestamp_getter)
results = {}
for timestamp, data in sorted_data:
head = max(data.items(), key=comparator)
tail = min(data.items(), key=comparator)
x1, y1 = 0, head[1]
x2, y2 = len(data), tail[1]
slope = (y2 - y1) / (x2 - x1)
intercept = y1
metric = 0
data_tuples = sorted(data.items(), key=comparator, reverse=True)
for index, item in enumerate(data_tuples):
username, actual = item
# line formula is y = slope * x + intercept
ideal = slope * index + intercept
diff = ideal - actual
metric = metric + diff
print "%s, %f" % (timestamp, metric / len(data))
results[timestamp] = metric / len(data)
import pygal
chart = pygal.Line()
chart.title = 'lol'
chart.x_labels = [stamp for stamp, blob in sorted_data]
chart.add('Metric', [results[stamp] for stamp, blob in sorted_data])
chart.render_in_browser()
We must be concerned with normalizing the data, because koji will always have highest magnitude of messages. This is done by: