Pages

Sunday, June 10, 2018

Einstein Analytics: Getting started with cogroup

You can combine data from two or more data streams into a single data stream using cogroup. The data streams must have at least one common field. By default, only data that exists in both groups appear in the results.

example:
qs = cogroup qsd by 'State', qsp by 'State';
in this sample, data stream qsd contain field State and data stream qsp also contain field State, we can use it for grouping.

q = cogroup ops by 'Account', meetings by 'Company';
Account in ops data stream will have the same value as Company in meetings data stream.


Use case: show death per state percentage from 2 datasets.



Let's use cogroup to combine the dataset:
dsd = load "StateDeath2";
dsp = load "StatePopulation2";
ds = cogroup dsd by 'State', dsp by 'State';
ds = foreach ds generate dsp.'State' as 'State', sum(dsp.'Count') as 'Population', sum(dsd.'Count') as 'Death', (sum(dsd.'Count')/sum(dsp.'Count')*100) as 'Death (%)';

The Result


Let's try to use Opportunity and User datasets from our previous blog.
dsu = load "user1";
dso = load "opportunity";
ds = cogroup dsu by 'Id', dso by 'OwnerId';
ds = foreach ds generate first(dsu.Name) as 'Name', sum(dso.Amount) as 'Sum_Amount';
ds = order ds by Name;



If you notice, Angela is not shown on that chart, because she does not have any Opportunity records. Remember that only data that exists in both groups appear in the results.

Wait... there is more, Outer cogrouping combines groups as an outer join. For the half-matches, null rows are added. The grouping keys are taken from the input that provides the value.

result = cogroup rows by expression [left | right | full], rows by expression;

Specify left, right, or full to indicate whether to perform a left outer join, a right outer join, or a full join.
Example: z = cogroup x by (day,origin) left, y by (day,airport);

You can apply an outer cogrouping across more than 2 sets of data. This example does a left outer join from a to b, with a right join to c:
Example: result = cogroup a by keya left, b by keyb right, c by keyc;

Let's use the same SAQL from above and add left outer join.
dsu = load "user1";
dso = load "opportunity";
ds = cogroup dsu by 'Id' left, dso by 'OwnerId';
ds = foreach ds generate first(dsu.Name) as 'Name', sum(dso.Amount) as 'Sum_Amount';
ds = order ds by Name;



Reference:


Page-level ad