digraph G {
0 [labelType="html" label="<b>Execute InsertIntoHadoopFsRelationCommand</b><br><br>number of written files: 1<br>written output: 200.4 KiB<br>number of output rows: 3,157<br>number of dynamic part: 0"];
1 [labelType="html" label="<b>Exchange</b><br><br>shuffle records written: 3,157<br>shuffle write time total (min, med, max (stageId: taskId))<br>33 ms (0 ms, 0 ms, 0 ms (stage 57.1: task 459))<br>records read: 3,157<br>local bytes read: 107.9 KiB<br>fetch wait time: 0 ms<br>remote bytes read: 113.6 KiB<br>local blocks read: 97<br>remote blocks read: 103<br>data size total (min, med, max (stageId: taskId))<br>246.6 KiB (320.0 B, 1200.0 B, 1840.0 B (stage 57.1: task 425))<br>shuffle bytes written total (min, med, max (stageId: taskId))<br>221.5 KiB (329.0 B, 1083.0 B, 1630.0 B (stage 57.1: task 527))"];
subgraph cluster2 {
isCluster="true";
label="WholeStageCodegen (2)\n \nduration: total (min, med, max (stageId: taskId))\n4.6 s (4 ms, 6 ms, 685 ms (stage 57.1: task 488))";
3 [labelType="html" label="<br><b>Project</b><br><br>"];
4 [labelType="html" label="<b>Filter</b><br><br>number of output rows: 3,157"];
5 [labelType="html" label="<b>HashAggregate</b><br><br>time in aggregation build total (min, med, max (stageId: taskId))<br>4.2 s (3 ms, 4 ms, 682 ms (stage 57.1: task 488))<br>peak memory total (min, med, max (stageId: taskId))<br>3.2 GiB (16.5 MiB, 16.5 MiB, 16.5 MiB (stage 57.1: task 421))<br>number of output rows: 2,257,379<br>avg hash probe bucket list iters (min, med, max (stageId: taskId)):<br>(1.4, 1.4, 1.4 (stage 57.1: task 421))"];
}
6 [labelType="html" label="<b>Exchange</b><br><br>shuffle records written: 588,149<br>shuffle write time total (min, med, max (stageId: taskId))<br>50 ms (0 ms, 0 ms, 27 ms (stage 56.1: task 418))<br>records read: 2,260,197<br>local bytes read total (min, med, max (stageId: taskId))<br>69.3 MiB (328.6 KiB, 349.0 KiB, 382.9 KiB (stage 57.1: task 460))<br>fetch wait time total (min, med, max (stageId: taskId))<br>3.2 s (0 ms, 0 ms, 674 ms (stage 57.1: task 488))<br>remote bytes read total (min, med, max (stageId: taskId))<br>69.5 MiB (325.7 KiB, 360.1 KiB, 388.2 KiB (stage 57.1: task 495))<br>local blocks read: 800<br>remote blocks read: 800<br>data size total (min, med, max (stageId: taskId))<br>49.4 MiB (0.0 B, 0.0 B, 24.8 MiB (stage 56.1: task 418))<br>shuffle bytes written total (min, med, max (stageId: taskId))<br>36.1 MiB (0.0 B, 0.0 B, 18.2 MiB (stage 56.1: task 418))"];
subgraph cluster7 {
isCluster="true";
label="WholeStageCodegen (1)\n \nduration: total (min, med, max (stageId: taskId))\n1.1 s (0 ms, 0 ms, 583 ms (stage 56.1: task 418))";
8 [labelType="html" label="<b>HashAggregate</b><br><br>time in aggregation build total (min, med, max (stageId: taskId))<br>874 ms (0 ms, 0 ms, 454 ms (stage 56.1: task 418))<br>peak memory total (min, med, max (stageId: taskId))<br>80.0 MiB (0.0 B, 0.0 B, 40.0 MiB (stage 56.1: task 419))<br>number of output rows: 588,149<br>avg hash probe bucket list iters (min, med, max (stageId: taskId)):<br>(1.6, 1.6, 1.6 (stage 56.1: task 419))"];
9 [labelType="html" label="<b>ColumnarToRow</b><br><br>number of output rows: 588,279<br>number of input batches: 156"];
}
10 [labelType="html" label="<b>Scan parquet itv024694_lending_club.customers</b><br><br>number of files read: 200<br>scan time total (min, med, max (stageId: taskId))<br>524 ms (0 ms, 0 ms, 271 ms (stage 56.1: task 418))<br>metadata time: 0 ms<br>size of files read: 184.9 MiB<br>number of output rows: 588,279"];
1->0;
3->1;
4->3;
5->4;
6->5;
8->6;
9->8;
10->9;
}
11
Execute InsertIntoHadoopFsRelationCommand hdfs://m01.itversity.com:9000/user/itv024694/bad_data/bad_data_customer, false, CSV, [header=true, path=/user/itv024694/bad_data/bad_data_customer], Overwrite, [member_id]
Exchange RoundRobinPartitioning(1), REPARTITION_WITH_NUM, [id=#482]
Project [member_id#577]
Filter (total_count#575L > 1)
HashAggregate(keys=[member_id#577], functions=[count(1)])
WholeStageCodegen (2)
Exchange hashpartitioning(member_id#577, 200), ENSURE_REQUIREMENTS, [id=#476]
HashAggregate(keys=[member_id#577], functions=[partial_count(1)])
ColumnarToRow
WholeStageCodegen (1)
FileScan parquet itv024694_lending_club.customers[member_id#577] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[hdfs://m01.itversity.com:9000/public/trendytech/lendingclubproject/cleaned/cust..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<member_id:string>