hadoop - oozie Coordinator for Historic dates -
i want run oozie coordinator historic dates , pass date parameter script in workflow. how do that?
can put start date old date ? catch up? , frequency should add it.
yes, when submit coordinator start date in past catches up. starts execution setting concurrency=1 saves cluster heavy load. can set execution=lifo if want process new files first. more info http://oozie.apache.org/docs/3.3.2/coordinatorfunctionalspec.html
i'm posting modified sample answer how schedule sqoop action using oozie
create coordinator.xml file:
<coordinator-app name="sample-coord" xmlns="uri:oozie:coordinator:0.2" frequency="${coord:days(7)}" start="${start}" end= "${end}" timezone="america/new_york"> <controls> <timeout>${timeout}</timeout> <concurrency>1</concurrency> </controls> <datasets> <dataset name="data" frequency="${coord:days(7)}" initial-instance="${start}" timezone="america/new_york"> <uri-template>${data_path}/${year}/${month}/${day}</uri-template> <done-flag/> </dataset> </datasets> <input-events> <data-in name="data_in" dataset="data"> <instance>${coord:current(0)}</instance> </data-in> </input-events> <action> <workflow> <app-path>${wf_application_path}</app-path> <configuration> <property> <name>input</name> <value>${coord:datain('data_in')}</value> </property> </configuration> </workflow> </action> </coordinator-app>
specify properties used in above file in coordinator.properties:
host=namenode01 namenode=hdfs://${host}:8020 wf_application_path=${namenode}/oozie/deployments/example oozie.coord.application.path=${wf_application_path} data_path=${namenode}/data start=2013-08-01t01:00z end=2013-08-19t23:59z timeout=10
upload coordinator.xml file hdfs , submit coordinator job like
oozie job -config coordinator.properties -run
Comments
Post a Comment