Sunday, June 10, 2018

Einstein Analytics: Getting started with cogroup

You can combine data from two or more data streams into a single data stream using cogroup. The data streams must have at least one common field. Only data that exists in both groups appear in the results.

qs = cogroup qsd by 'State', qsp by 'State';
in this sample, data stream qsd contain field State and data stream qsp also contain field State, we can use it for grouping.

q = cogroup ops by 'Account', meetings by 'Company';
Account in ops data stream will have the same value with Company in meetings data stream.

Use case: show death per state percentage from 2 datasets.

Let's use cogroup to combine the dataset:
dsd = load "StateDeath2";
dsp = load "StatePopulation2";
ds = cogroup dsd by 'State', dsp by 'State';
ds = foreach ds generate dsp.'State' as 'State', sum(dsp.'Count') as 'Population', sum(dsd.'Count') as 'Death', (sum(dsd.'Count')/sum(dsp.'Count')*100) as 'Death (%)';

The Result

Let's try to use Opportunity and User datasets from our previous blog.
dsu = load "user1";
dso = load "opportunity";
ds = cogroup dsu by 'Id', dso by 'OwnerId';
ds = foreach ds generate first(dsu.Name) as 'Name', sum(dso.Amount) as 'Sum_Amount';
ds = order ds by Name;

If you notice, Angela is not shown on that chart, because she do not have any Opportunity records. Remember that only data that exists in both groups appear in the results.