== Physical Plan == CollectLimit (20) +- InMemoryTableScan (1) +- InMemoryRelation (2) +- * Sort (19) +- Exchange (18) +- * Project (17) +- * BroadcastHashJoin Inner BuildLeft (16) :- BroadcastExchange (10) : +- * Project (9) : +- * Filter (8) : +- * ColumnarToRow (7) : +- InMemoryTableScan (3) : +- InMemoryRelation (4) : +- * Project (6) : +- Scan csv (5) +- * Filter (15) +- InMemoryTableScan (11) +- InMemoryRelation (12) +- * Project (14) +- Scan csv (13) (1) InMemoryTableScan Output [7]: [sector_id#94339267, numcos#94339270, numdates#94339271, sort#94160419, description#94160423, universe#94339595, coverage#94339521] Arguments: [sector_id#94339267, numcos#94339270, numdates#94339271, sort#94160419, description#94160423, universe#94339595, coverage#94339521] (2) InMemoryRelation Arguments: [sector_id#94339267, numcos#94339270, numdates#94339271, sort#94160419, description#94160423, universe#94339595, coverage#94339521], CachedRDDBuilder(org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer@208e3fd9,StorageLevel(disk, memory, deserialized, 1 replicas),*(3) Sort [sort#94160419 ASC NULLS FIRST, description#94160423 ASC NULLS FIRST], true, 0 +- Exchange rangepartitioning(sort#94160419 ASC NULLS FIRST, description#94160423 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [id=#7532240] +- *(2) Project [sector_id#94339267, numcos#94339270, numdates#94339271, sort#94160419, description#94160423, universe#94339595, coverage#94339521] +- *(2) BroadcastHashJoin [sector_id#94339267], [sector_id#94160418], Inner, BuildLeft, false :- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#7532233] : +- *(1) Project [sector_id#94339267, numcos#94339270, numdates#94339271, coverage#94339521, round((cast(numcos#94339270 as double) / cast(coverage#94339521 as double)), 0) AS universe#94339595] : +- *(1) Filter isnotnull(sector_id#94339267) : +- *(1) ColumnarToRow : +- InMemoryTableScan [coverage#94339521, numcos#94339270, numdates#94339271, sector_id#94339267], [isnotnull(sector_id#94339267)] : +- InMemoryRelation [sector_id#94339267, retIC#94339268, resretIC#94339269, numcos#94339270, numdates#94339271, annual_bmret#94339272, annual_ret#94339444, std_ret#94339458, Sharpe_ret#94339472, PctPos_ret#94339486, TR_ret#94339490, IR_ret#94339504, annual_resret#94339506, std_resret#94339509, Sharpe_resret#94339510, PctPos_resret#94339511, TR_resret#94339512, IR_resret#94339513, annual_retnet#94339514, std_retnet#94339515, Sharpe_retnet#94339516, PctPos_retnet#94339517, TR_retnet#94339518, IR_retnet#94339519, ... 2 more fields], StorageLevel(disk, memory, deserialized, 1 replicas) : +- *(1) Project [CASE WHEN ((sector_id#94339030 = NA) OR (sector_id#94339030 = null)) THEN null ELSE cast(sector_id#94339030 as int) END AS sector_id#94339267, CASE WHEN ((retIC#94339031 = NA) OR (retIC#94339031 = null)) THEN null ELSE cast(retIC#94339031 as float) END AS retIC#94339268, CASE WHEN ((resretIC#94339032 = NA) OR (resretIC#94339032 = null)) THEN null ELSE cast(resretIC#94339032 as float) END AS resretIC#94339269, CASE WHEN ((numcos#94339033 = NA) OR (numcos#94339033 = null)) THEN null ELSE cast(numcos#94339033 as float) END AS numcos#94339270, CASE WHEN ((numdates#94339034 = NA) OR (numdates#94339034 = null)) THEN null ELSE cast(numdates#94339034 as float) END AS numdates#94339271, CASE WHEN ((annual_bmret#94339035 = NA) OR (annual_bmret#94339035 = null)) THEN null ELSE cast(annual_bmret#94339035 as float) END AS annual_bmret#94339272, CASE WHEN ((annual_ret#94339036 = NA) OR (annual_ret#94339036 = null)) THEN null ELSE cast(annual_ret#94339036 as float) END AS annual_ret#94339444, CASE WHEN ((std_ret#94339037 = NA) OR (std_ret#94339037 = null)) THEN null ELSE cast(std_ret#94339037 as float) END AS std_ret#94339458, CASE WHEN ((Sharpe_ret#94339038 = NA) OR (Sharpe_ret#94339038 = null)) THEN null ELSE cast(Sharpe_ret#94339038 as float) END AS Sharpe_ret#94339472, CASE WHEN ((PctPos_ret#94339039 = NA) OR (PctPos_ret#94339039 = null)) THEN null ELSE cast(PctPos_ret#94339039 as float) END AS PctPos_ret#94339486, CASE WHEN ((TR_ret#94339040 = NA) OR (TR_ret#94339040 = null)) THEN null ELSE cast(TR_ret#94339040 as float) END AS TR_ret#94339490, CASE WHEN ((IR_ret#94339041 = NA) OR (IR_ret#94339041 = null)) THEN null ELSE cast(IR_ret#94339041 as float) END AS IR_ret#94339504, CASE WHEN ((annual_resret#94339042 = NA) OR (annual_resret#94339042 = null)) THEN null ELSE cast(annual_resret#94339042 as float) END AS annual_resret#94339506, CASE WHEN ((std_resret#94339043 = NA) OR (std_resret#94339043 = null)) THEN null ELSE cast(std_resret#94339043 as float) END AS std_resret#94339509, CASE WHEN ((Sharpe_resret#94339044 = NA) OR (Sharpe_resret#94339044 = null)) THEN null ELSE cast(Sharpe_resret#94339044 as float) END AS Sharpe_resret#94339510, CASE WHEN ((PctPos_resret#94339045 = NA) OR (PctPos_resret#94339045 = null)) THEN null ELSE cast(PctPos_resret#94339045 as float) END AS PctPos_resret#94339511, CASE WHEN ((TR_resret#94339046 = NA) OR (TR_resret#94339046 = null)) THEN null ELSE cast(TR_resret#94339046 as float) END AS TR_resret#94339512, CASE WHEN ((IR_resret#94339047 = NA) OR (IR_resret#94339047 = null)) THEN null ELSE cast(IR_resret#94339047 as float) END AS IR_resret#94339513, CASE WHEN ((annual_retnet#94339048 = NA) OR (annual_retnet#94339048 = null)) THEN null ELSE cast(annual_retnet#94339048 as float) END AS annual_retnet#94339514, CASE WHEN ((std_retnet#94339049 = NA) OR (std_retnet#94339049 = null)) THEN null ELSE cast(std_retnet#94339049 as float) END AS std_retnet#94339515, CASE WHEN ((Sharpe_retnet#94339050 = NA) OR (Sharpe_retnet#94339050 = null)) THEN null ELSE cast(Sharpe_retnet#94339050 as float) END AS Sharpe_retnet#94339516, CASE WHEN ((PctPos_retnet#94339051 = NA) OR (PctPos_retnet#94339051 = null)) THEN null ELSE cast(PctPos_retnet#94339051 as float) END AS PctPos_retnet#94339517, CASE WHEN ((TR_retnet#94339052 = NA) OR (TR_retnet#94339052 = null)) THEN null ELSE cast(TR_retnet#94339052 as float) END AS TR_retnet#94339518, CASE WHEN ((IR_retnet#94339053 = NA) OR (IR_retnet#94339053 = null)) THEN null ELSE cast(IR_retnet#94339053 as float) END AS IR_retnet#94339519, ... 2 more fields] : +- FileScan csv [sector_id#94339030,retIC#94339031,resretIC#94339032,numcos#94339033,numdates#94339034,annual_bmret#94339035,annual_ret#94339036,std_ret#94339037,Sharpe_ret#94339038,PctPos_ret#94339039,TR_ret#94339040,IR_ret#94339041,annual_resret#94339042,std_resret#94339043,Sharpe_resret#94339044,PctPos_resret#94339045,TR_resret#94339046,IR_resret#94339047,annual_retnet#94339048,std_retnet#94339049,Sharpe_retnet#94339050,PctPos_retnet#94339051,TR_retnet#94339052,IR_retnet#94339053,... 2 more fields] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[file:/srv/plusamp/data/default/ea-market/output/estimize_signal_histor..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<sector_id:string,retIC:string,resretIC:string,numcos:string,numdates:string,annual_bmret:s... +- *(2) Filter isnotnull(sector_id#94160418) +- InMemoryTableScan [sector_id#94160418, sort#94160419, description#94160423], [isnotnull(sector_id#94160418)] +- InMemoryRelation [sector_id#94160418, sort#94160419, description#94160423, universe#94160424], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) Project [CASE WHEN ((sector_id#94160398 = NA) OR (sector_id#94160398 = null)) THEN null ELSE cast(sector_id#94160398 as int) END AS sector_id#94160418, CASE WHEN (sort#94160399 = null) THEN null ELSE sort#94160399 END AS sort#94160419, CASE WHEN (description#94160400 = null) THEN null ELSE description#94160400 END AS description#94160423, CASE WHEN ((universe#94160401 = NA) OR (universe#94160401 = null)) THEN null ELSE cast(universe#94160401 as int) END AS universe#94160424] +- FileScan csv [sector_id#94160398,sort#94160399,description#94160400,universe#94160401] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[file:/srv/plusamp/data/default/ea-market/curate/curate_sector.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<sector_id:string,sort:string,description:string,universe:string> ,None), [sort#94160419 ASC NULLS FIRST, description#94160423 ASC NULLS FIRST] (3) InMemoryTableScan Output [4]: [coverage#94339521, numcos#94339270, numdates#94339271, sector_id#94339267] Arguments: [coverage#94339521, numcos#94339270, numdates#94339271, sector_id#94339267], [isnotnull(sector_id#94339267)] (4) InMemoryRelation Arguments: [sector_id#94339267, retIC#94339268, resretIC#94339269, numcos#94339270, numdates#94339271, annual_bmret#94339272, annual_ret#94339444, std_ret#94339458, Sharpe_ret#94339472, PctPos_ret#94339486, TR_ret#94339490, IR_ret#94339504, annual_resret#94339506, std_resret#94339509, Sharpe_resret#94339510, PctPos_resret#94339511, TR_resret#94339512, IR_resret#94339513, annual_retnet#94339514, std_retnet#94339515, Sharpe_retnet#94339516, PctPos_retnet#94339517, TR_retnet#94339518, IR_retnet#94339519, ... 2 more fields], CachedRDDBuilder(org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer@208e3fd9,StorageLevel(disk, memory, deserialized, 1 replicas),*(1) Project [CASE WHEN ((sector_id#94339030 = NA) OR (sector_id#94339030 = null)) THEN null ELSE cast(sector_id#94339030 as int) END AS sector_id#94339267, CASE WHEN ((retIC#94339031 = NA) OR (retIC#94339031 = null)) THEN null ELSE cast(retIC#94339031 as float) END AS retIC#94339268, CASE WHEN ((resretIC#94339032 = NA) OR (resretIC#94339032 = null)) THEN null ELSE cast(resretIC#94339032 as float) END AS resretIC#94339269, CASE WHEN ((numcos#94339033 = NA) OR (numcos#94339033 = null)) THEN null ELSE cast(numcos#94339033 as float) END AS numcos#94339270, CASE WHEN ((numdates#94339034 = NA) OR (numdates#94339034 = null)) THEN null ELSE cast(numdates#94339034 as float) END AS numdates#94339271, CASE WHEN ((annual_bmret#94339035 = NA) OR (annual_bmret#94339035 = null)) THEN null ELSE cast(annual_bmret#94339035 as float) END AS annual_bmret#94339272, CASE WHEN ((annual_ret#94339036 = NA) OR (annual_ret#94339036 = null)) THEN null ELSE cast(annual_ret#94339036 as float) END AS annual_ret#94339444, CASE WHEN ((std_ret#94339037 = NA) OR (std_ret#94339037 = null)) THEN null ELSE cast(std_ret#94339037 as float) END AS std_ret#94339458, CASE WHEN ((Sharpe_ret#94339038 = NA) OR (Sharpe_ret#94339038 = null)) THEN null ELSE cast(Sharpe_ret#94339038 as float) END AS Sharpe_ret#94339472, CASE WHEN ((PctPos_ret#94339039 = NA) OR (PctPos_ret#94339039 = null)) THEN null ELSE cast(PctPos_ret#94339039 as float) END AS PctPos_ret#94339486, CASE WHEN ((TR_ret#94339040 = NA) OR (TR_ret#94339040 = null)) THEN null ELSE cast(TR_ret#94339040 as float) END AS TR_ret#94339490, CASE WHEN ((IR_ret#94339041 = NA) OR (IR_ret#94339041 = null)) THEN null ELSE cast(IR_ret#94339041 as float) END AS IR_ret#94339504, CASE WHEN ((annual_resret#94339042 = NA) OR (annual_resret#94339042 = null)) THEN null ELSE cast(annual_resret#94339042 as float) END AS annual_resret#94339506, CASE WHEN ((std_resret#94339043 = NA) OR (std_resret#94339043 = null)) THEN null ELSE cast(std_resret#94339043 as float) END AS std_resret#94339509, CASE WHEN ((Sharpe_resret#94339044 = NA) OR (Sharpe_resret#94339044 = null)) THEN null ELSE cast(Sharpe_resret#94339044 as float) END AS Sharpe_resret#94339510, CASE WHEN ((PctPos_resret#94339045 = NA) OR (PctPos_resret#94339045 = null)) THEN null ELSE cast(PctPos_resret#94339045 as float) END AS PctPos_resret#94339511, CASE WHEN ((TR_resret#94339046 = NA) OR (TR_resret#94339046 = null)) THEN null ELSE cast(TR_resret#94339046 as float) END AS TR_resret#94339512, CASE WHEN ((IR_resret#94339047 = NA) OR (IR_resret#94339047 = null)) THEN null ELSE cast(IR_resret#94339047 as float) END AS IR_resret#94339513, CASE WHEN ((annual_retnet#94339048 = NA) OR (annual_retnet#94339048 = null)) THEN null ELSE cast(annual_retnet#94339048 as float) END AS annual_retnet#94339514, CASE WHEN ((std_retnet#94339049 = NA) OR (std_retnet#94339049 = null)) THEN null ELSE cast(std_retnet#94339049 as float) END AS std_retnet#94339515, CASE WHEN ((Sharpe_retnet#94339050 = NA) OR (Sharpe_retnet#94339050 = null)) THEN null ELSE cast(Sharpe_retnet#94339050 as float) END AS Sharpe_retnet#94339516, CASE WHEN ((PctPos_retnet#94339051 = NA) OR (PctPos_retnet#94339051 = null)) THEN null ELSE cast(PctPos_retnet#94339051 as float) END AS PctPos_retnet#94339517, CASE WHEN ((TR_retnet#94339052 = NA) OR (TR_retnet#94339052 = null)) THEN null ELSE cast(TR_retnet#94339052 as float) END AS TR_retnet#94339518, CASE WHEN ((IR_retnet#94339053 = NA) OR (IR_retnet#94339053 = null)) THEN null ELSE cast(IR_retnet#94339053 as float) END AS IR_retnet#94339519, ... 2 more fields] +- FileScan csv [sector_id#94339030,retIC#94339031,resretIC#94339032,numcos#94339033,numdates#94339034,annual_bmret#94339035,annual_ret#94339036,std_ret#94339037,Sharpe_ret#94339038,PctPos_ret#94339039,TR_ret#94339040,IR_ret#94339041,annual_resret#94339042,std_resret#94339043,Sharpe_resret#94339044,PctPos_resret#94339045,TR_resret#94339046,IR_resret#94339047,annual_retnet#94339048,std_retnet#94339049,Sharpe_retnet#94339050,PctPos_retnet#94339051,TR_retnet#94339052,IR_retnet#94339053,... 2 more fields] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[file:/srv/plusamp/data/default/ea-market/output/estimize_signal_histor..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<sector_id:string,retIC:string,resretIC:string,numcos:string,numdates:string,annual_bmret:s... ,None) (5) Scan csv Output [26]: [sector_id#94339030, retIC#94339031, resretIC#94339032, numcos#94339033, numdates#94339034, annual_bmret#94339035, annual_ret#94339036, std_ret#94339037, Sharpe_ret#94339038, PctPos_ret#94339039, TR_ret#94339040, IR_ret#94339041, annual_resret#94339042, std_resret#94339043, Sharpe_resret#94339044, PctPos_resret#94339045, TR_resret#94339046, IR_resret#94339047, annual_retnet#94339048, std_retnet#94339049, Sharpe_retnet#94339050, PctPos_retnet#94339051, TR_retnet#94339052, IR_retnet#94339053, turnover#94339054, coverage#94339055] Batched: false Location: InMemoryFileIndex [file:/srv/plusamp/data/default/ea-market/output/estimize_signal_history/estimizesignal_preearnings/stats_sector_id.csv] ReadSchema: struct<sector_id:string,retIC:string,resretIC:string,numcos:string,numdates:string,annual_bmret:string,annual_ret:string,std_ret:string,Sharpe_ret:string,PctPos_ret:string,TR_ret:string,IR_ret:string,annual_resret:string,std_resret:string,Sharpe_resret:string,PctPos_resret:string,TR_resret:string,IR_resret:string,annual_retnet:string,std_retnet:string,Sharpe_retnet:string,PctPos_retnet:string,TR_retnet:string,IR_retnet:string,turnover:string,coverage:string> (6) Project [codegen id : 1] Output [26]: [CASE WHEN ((sector_id#94339030 = NA) OR (sector_id#94339030 = null)) THEN null ELSE cast(sector_id#94339030 as int) END AS sector_id#94339267, CASE WHEN ((retIC#94339031 = NA) OR (retIC#94339031 = null)) THEN null ELSE cast(retIC#94339031 as float) END AS retIC#94339268, CASE WHEN ((resretIC#94339032 = NA) OR (resretIC#94339032 = null)) THEN null ELSE cast(resretIC#94339032 as float) END AS resretIC#94339269, CASE WHEN ((numcos#94339033 = NA) OR (numcos#94339033 = null)) THEN null ELSE cast(numcos#94339033 as float) END AS numcos#94339270, CASE WHEN ((numdates#94339034 = NA) OR (numdates#94339034 = null)) THEN null ELSE cast(numdates#94339034 as float) END AS numdates#94339271, CASE WHEN ((annual_bmret#94339035 = NA) OR (annual_bmret#94339035 = null)) THEN null ELSE cast(annual_bmret#94339035 as float) END AS annual_bmret#94339272, CASE WHEN ((annual_ret#94339036 = NA) OR (annual_ret#94339036 = null)) THEN null ELSE cast(annual_ret#94339036 as float) END AS annual_ret#94339444, CASE WHEN ((std_ret#94339037 = NA) OR (std_ret#94339037 = null)) THEN null ELSE cast(std_ret#94339037 as float) END AS std_ret#94339458, CASE WHEN ((Sharpe_ret#94339038 = NA) OR (Sharpe_ret#94339038 = null)) THEN null ELSE cast(Sharpe_ret#94339038 as float) END AS Sharpe_ret#94339472, CASE WHEN ((PctPos_ret#94339039 = NA) OR (PctPos_ret#94339039 = null)) THEN null ELSE cast(PctPos_ret#94339039 as float) END AS PctPos_ret#94339486, CASE WHEN ((TR_ret#94339040 = NA) OR (TR_ret#94339040 = null)) THEN null ELSE cast(TR_ret#94339040 as float) END AS TR_ret#94339490, CASE WHEN ((IR_ret#94339041 = NA) OR (IR_ret#94339041 = null)) THEN null ELSE cast(IR_ret#94339041 as float) END AS IR_ret#94339504, CASE WHEN ((annual_resret#94339042 = NA) OR (annual_resret#94339042 = null)) THEN null ELSE cast(annual_resret#94339042 as float) END AS annual_resret#94339506, CASE WHEN ((std_resret#94339043 = NA) OR (std_resret#94339043 = null)) THEN null ELSE cast(std_resret#94339043 as float) END AS std_resret#94339509, CASE WHEN ((Sharpe_resret#94339044 = NA) OR (Sharpe_resret#94339044 = null)) THEN null ELSE cast(Sharpe_resret#94339044 as float) END AS Sharpe_resret#94339510, CASE WHEN ((PctPos_resret#94339045 = NA) OR (PctPos_resret#94339045 = null)) THEN null ELSE cast(PctPos_resret#94339045 as float) END AS PctPos_resret#94339511, CASE WHEN ((TR_resret#94339046 = NA) OR (TR_resret#94339046 = null)) THEN null ELSE cast(TR_resret#94339046 as float) END AS TR_resret#94339512, CASE WHEN ((IR_resret#94339047 = NA) OR (IR_resret#94339047 = null)) THEN null ELSE cast(IR_resret#94339047 as float) END AS IR_resret#94339513, CASE WHEN ((annual_retnet#94339048 = NA) OR (annual_retnet#94339048 = null)) THEN null ELSE cast(annual_retnet#94339048 as float) END AS annual_retnet#94339514, CASE WHEN ((std_retnet#94339049 = NA) OR (std_retnet#94339049 = null)) THEN null ELSE cast(std_retnet#94339049 as float) END AS std_retnet#94339515, CASE WHEN ((Sharpe_retnet#94339050 = NA) OR (Sharpe_retnet#94339050 = null)) THEN null ELSE cast(Sharpe_retnet#94339050 as float) END AS Sharpe_retnet#94339516, CASE WHEN ((PctPos_retnet#94339051 = NA) OR (PctPos_retnet#94339051 = null)) THEN null ELSE cast(PctPos_retnet#94339051 as float) END AS PctPos_retnet#94339517, CASE WHEN ((TR_retnet#94339052 = NA) OR (TR_retnet#94339052 = null)) THEN null ELSE cast(TR_retnet#94339052 as float) END AS TR_retnet#94339518, CASE WHEN ((IR_retnet#94339053 = NA) OR (IR_retnet#94339053 = null)) THEN null ELSE cast(IR_retnet#94339053 as float) END AS IR_retnet#94339519, CASE WHEN ((turnover#94339054 = NA) OR (turnover#94339054 = null)) THEN null ELSE cast(turnover#94339054 as float) END AS turnover#94339520, CASE WHEN ((coverage#94339055 = NA) OR (coverage#94339055 = null)) THEN null ELSE cast(coverage#94339055 as float) END AS coverage#94339521] Input [26]: [sector_id#94339030, retIC#94339031, resretIC#94339032, numcos#94339033, numdates#94339034, annual_bmret#94339035, annual_ret#94339036, std_ret#94339037, Sharpe_ret#94339038, PctPos_ret#94339039, TR_ret#94339040, IR_ret#94339041, annual_resret#94339042, std_resret#94339043, Sharpe_resret#94339044, PctPos_resret#94339045, TR_resret#94339046, IR_resret#94339047, annual_retnet#94339048, std_retnet#94339049, Sharpe_retnet#94339050, PctPos_retnet#94339051, TR_retnet#94339052, IR_retnet#94339053, turnover#94339054, coverage#94339055] (7) ColumnarToRow [codegen id : 1] Input [4]: [coverage#94339521, numcos#94339270, numdates#94339271, sector_id#94339267] (8) Filter [codegen id : 1] Input [4]: [coverage#94339521, numcos#94339270, numdates#94339271, sector_id#94339267] Condition : isnotnull(sector_id#94339267) (9) Project [codegen id : 1] Output [5]: [sector_id#94339267, numcos#94339270, numdates#94339271, coverage#94339521, round((cast(numcos#94339270 as double) / cast(coverage#94339521 as double)), 0) AS universe#94339595] Input [4]: [coverage#94339521, numcos#94339270, numdates#94339271, sector_id#94339267] (10) BroadcastExchange Input [5]: [sector_id#94339267, numcos#94339270, numdates#94339271, coverage#94339521, universe#94339595] Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#7532233] (11) InMemoryTableScan Output [3]: [sector_id#94160418, sort#94160419, description#94160423] Arguments: [sector_id#94160418, sort#94160419, description#94160423], [isnotnull(sector_id#94160418)] (12) InMemoryRelation Arguments: [sector_id#94160418, sort#94160419, description#94160423, universe#94160424], CachedRDDBuilder(org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer@208e3fd9,StorageLevel(disk, memory, deserialized, 1 replicas),*(1) Project [CASE WHEN ((sector_id#94160398 = NA) OR (sector_id#94160398 = null)) THEN null ELSE cast(sector_id#94160398 as int) END AS sector_id#94160418, CASE WHEN (sort#94160399 = null) THEN null ELSE sort#94160399 END AS sort#94160419, CASE WHEN (description#94160400 = null) THEN null ELSE description#94160400 END AS description#94160423, CASE WHEN ((universe#94160401 = NA) OR (universe#94160401 = null)) THEN null ELSE cast(universe#94160401 as int) END AS universe#94160424] +- FileScan csv [sector_id#94160398,sort#94160399,description#94160400,universe#94160401] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[file:/srv/plusamp/data/default/ea-market/curate/curate_sector.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<sector_id:string,sort:string,description:string,universe:string> ,None) (13) Scan csv Output [4]: [sector_id#94160398, sort#94160399, description#94160400, universe#94160401] Batched: false Location: InMemoryFileIndex [file:/srv/plusamp/data/default/ea-market/curate/curate_sector.csv] ReadSchema: struct<sector_id:string,sort:string,description:string,universe:string> (14) Project [codegen id : 1] Output [4]: [CASE WHEN ((sector_id#94160398 = NA) OR (sector_id#94160398 = null)) THEN null ELSE cast(sector_id#94160398 as int) END AS sector_id#94160418, CASE WHEN (sort#94160399 = null) THEN null ELSE sort#94160399 END AS sort#94160419, CASE WHEN (description#94160400 = null) THEN null ELSE description#94160400 END AS description#94160423, CASE WHEN ((universe#94160401 = NA) OR (universe#94160401 = null)) THEN null ELSE cast(universe#94160401 as int) END AS universe#94160424] Input [4]: [sector_id#94160398, sort#94160399, description#94160400, universe#94160401] (15) Filter Input [3]: [sector_id#94160418, sort#94160419, description#94160423] Condition : isnotnull(sector_id#94160418) (16) BroadcastHashJoin [codegen id : 2] Left keys [1]: [sector_id#94339267] Right keys [1]: [sector_id#94160418] Join condition: None (17) Project [codegen id : 2] Output [7]: [sector_id#94339267, numcos#94339270, numdates#94339271, sort#94160419, description#94160423, universe#94339595, coverage#94339521] Input [8]: [sector_id#94339267, numcos#94339270, numdates#94339271, coverage#94339521, universe#94339595, sector_id#94160418, sort#94160419, description#94160423] (18) Exchange Input [7]: [sector_id#94339267, numcos#94339270, numdates#94339271, sort#94160419, description#94160423, universe#94339595, coverage#94339521] Arguments: rangepartitioning(sort#94160419 ASC NULLS FIRST, description#94160423 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [id=#7532240] (19) Sort [codegen id : 3] Input [7]: [sector_id#94339267, numcos#94339270, numdates#94339271, sort#94160419, description#94160423, universe#94339595, coverage#94339521] Arguments: [sort#94160419 ASC NULLS FIRST, description#94160423 ASC NULLS FIRST], true, 0 (20) CollectLimit Input [7]: [sector_id#94339267, numcos#94339270, numdates#94339271, sort#94160419, description#94160423, universe#94339595, coverage#94339521] Arguments: 1000000