Pages

Sunday, June 10, 2018

Einstein Analytics: Getting started with cogroup

You can combine data from two or more data streams into a single data stream using cogroup. The data streams must have at least one common field. Only data that exists in both groups appear in the results.

example:
qs = cogroup qsd by 'State', qsp by 'State';
in this sample, data stream qsd contain field State and data stream qsp also contain field State, we can use it for grouping.

q = cogroup ops by 'Account', meetings by 'Company';
Account in ops data stream will have the same value with Company in meetings data stream.


Use case: show death per state percentage from 2 datasets.



Let's use cogroup to combine the dataset:
dsd = load "StateDeath2";
dsp = load "StatePopulation2";
ds = cogroup dsd by 'State', dsp by 'State';
ds = foreach ds generate dsp.'State' as 'State', sum(dsp.'Count') as 'Population', sum(dsd.'Count') as 'Death', (sum(dsd.'Count')/sum(dsp.'Count')*100) as 'Death (%)';

The Result


Let's try to use Opportunity and User datasets from our previous blog.
dsu = load "user1";
dso = load "opportunity";
ds = cogroup dsu by 'Id', dso by 'OwnerId';
ds = foreach ds generate first(dsu.Name) as 'Name', sum(dso.Amount) as 'Sum_Amount';
ds = order ds by Name;


If you notice, Angela is not shown on that chart, because she do not have any Opportunity records. Remember that only data that exists in both groups appear in the results.


Reference: