Pozz, imam jedan problem kod sparka. Zna li netko kako li mogu navedeni postgresql query napisati uz pomoc RDD operacija tako da konacni rezultat bude isti kao kad ga pokrenem u postgresu. Znam da ponajprije moram koristiti transformations ali neznam kako napraviti join, dali je isto kao u postgresu ili. Rjesenje sam mislio napisati u pysparku.
SELECT Tournaments.TYear,Countries.Name,Max(Matches.MatchDate)- Min(Matches.MatchDate)AS LENGTH
FROM Tournaments,Countries,Hosts,Teams,Matches
WHERE Tournaments.TYear = Hosts.TYear AND Countries.Cid = Hosts.Cid AND(Teams.Tid = Matches.HomeTid OR Teams.Tid = Matches.VisitTid)AND date_part('year', Matches.MatchDate)::text LIKE(Tournaments.TYear ||'%')
GROUPBY Tournaments.TYear,Countries.Name
ORDERBY LENGTH,Tournaments.TYear ASC