grokkingfedmsg
This is a raw dump of brainstormery had during a hacksession with Threebean.
Deps
$ sudo yum install python-fedmsg-meta-fedora-infrastructure
$ hub clone ralphbean/fedora-stats-tools
The Longtail Metric
Though this was only about 90 minutes of cycling, it is the part that is burned most into my brain. This metric is all about Helping identify how "flat" the message distributions are, to avoid uneven burnout mode... aka, take the agent that is generating the most messages within a time frame (the "Head"), and the agent generating the least number of messages in that timeframe(the "Tail"), and come up with a line drawn between them. The more "flat" that line is, the more even the number of generated messages is amongst all contributors. Still unclear? Me too ;) Here's some python instead:
Logtail.analyze at longtail-gather.py
import collections
import json
import pprint
import time
import requests
import fedmsg.config
import fedmsg.meta
config = fedmsg.config.load_config()
fedmsg.meta.make_processors(**config)
start = time.time()
one_day = 1 * 24 * 60 * 60
whole_range = one_day
N = 50
def get_page(page, end, delta):
url = 'https://apps.fedoraproject.org/datagrepper/raw'
response = requests.get(url, params=dict(
delta=delta,
page=page,
end=end,
rows_per_page=100,
))
data = response.json()
return data
results = {}
now = time.time()
for iteration, end in enumerate(range(*map(int, (now - whole_range, now, whole_range / N)))):
results[end] = collections.defaultdict(int)
data = get_page(1, end, whole_range)
pages = data['pages']
for page in range(1, pages + 1):
print "* (", iteration, ") getting page", page, "of", data['pages'], "with end", end, "and delta", whole_range
data = get_page(page, end, whole_range)
messages = data['raw_messages']
for message in messages:
users = fedmsg.meta.msg2usernames(message, **config)
for user in users:
results[end][user] += 1
#pprint.pprint(dict(results))
with open('foo.json', 'w') as f:
f.write(json.dumps(results))
Logtail.analyze at longtail-analyze.py
import json
comparator = lambda item: item[1]
with open('foo.json', 'r') as f:
all_data = json.loads(f.read())
for timestamp, data in all_data.items():
for username, value in data.items():
all_data[timestamp][username] = float(value)
timestamp_getter = lambda item: item[0]
sorted_data = sorted(all_data.items(), key=timestamp_getter)
results = {}
for timestamp, data in sorted_data:
head = max(data.items(), key=comparator)
tail = min(data.items(), key=comparator)
x1, y1 = 0, head[1]
x2, y2 = len(data), tail[1]
slope = (y2 - y1) / (x2 - x1)
intercept = y1
metric = 0
data_tuples = sorted(data.items(), key=comparator, reverse=True)
for index, item in enumerate(data_tuples):
username, actual = item
# line formula is y = slope * x + intercept
ideal = slope * index + intercept
diff = ideal - actual
metric = metric + diff
print "%s, %f" % (timestamp, metric / len(data))
results[timestamp] = metric / len(data)
import pygal
chart = pygal.Line()
chart.title = 'lol'
chart.x_labels = [stamp for stamp, blob in sorted_data]
chart.add('Metric', [results[stamp] for stamp, blob in sorted_data])
chart.render_in_browser()
Stuff to build/consider next?
Radar Charts
We must be concerned with normalizing the data, because koji will always have highest magnitude of messages. This is done by:
- querying all messages of a type, get the total
- querying just messages for that user, in that type
- divide usermessages/totalmessages
- Daily +/-
- just the diff of topic counts
- weekly +/-
- just the diff of topic counts
Real-time?
- barchart with bar for each message topic?
- array of "lights" that blink each time a message comes across the bus
- revisit the live-gource of fedmsg :)