Apache Nifi Processors in version 0.7.4
For other nifi versions, please reference our default processors post. Check the Apache nifi site for downloads or any nifi version or for current version docs
List of Processors
- AttributesToJSON
- Base64EncodeContent
- CompressContent
- ConsumeAMQP
- ConsumeJMS
- ConsumeKafka
- ConsumeMQTT
- ControlRate
- ConvertAvroSchema
- ConvertAvroToJSON
- ConvertCharacterSet
- ConvertCSVToAvro
- ConvertJSONToAvro
- ConvertJSONToSQL
- CreateHadoopSequenceFile
- DebugFlow
- DeleteDynamoDB
- DeleteS3Object
- DeleteSQS
- DetectDuplicate
- DistributeLoad
- DuplicateFlowFile
- EncryptContent
- EvaluateJsonPath
- EvaluateRegularExpression
- EvaluateXPath
- EvaluateXQuery
- ExecuteFlumeSink
- ExecuteFlumeSource
- ExecuteProcess
- ExecuteScript
- ExecuteSQL
- ExecuteStreamCommand
- ExtractAvroMetadata
- ExtractHL7Attributes
- ExtractImageMetadata
- ExtractMediaMetadata
- ExtractText
- FetchDistributedMapCache
- FetchElasticsearch
- FetchFile
- FetchHDFS
- FetchS3Object
- FetchSFTP
- GenerateFlowFile
- GeoEnrichIP
- GetAzureEventHub
- GetCouchbaseKey
- GetDynamoDB
- GetFile
- GetFTP
- GetHBase
- GetHDFS
- GetHDFSEvents
- GetHDFSSequenceFile
- GetHTMLElement
- GetHTTP
- GetJMSQueue
- GetJMSTopic
- GetKafka
- GetMongo
- GetSFTP
- GetSNMP
- GetSolr
- GetSplunk
- GetSQS
- GetTwitter
- HandleHttpRequest
- HandleHttpResponse
- HashAttribute
- HashContent
- IdentifyMimeType
- InferAvroSchema
- InvokeHTTP
- InvokeScriptedProcessor
- JoltTransformJSON
- ListenHTTP
- ListenLumberjack
- ListenRELP
- ListenSyslog
- ListenTCP
- ListenUDP
- ListFile
- ListHDFS
- ListS3
- ListSFTP
- LogAttribute
- MergeContent
- ModifyBytes
- ModifyHTMLElement
- MonitorActivity
- ParseSyslog
- PostHTTP
- PublishAMQP
- PublishJMS
- PublishKafka
- PublishMQTT
- PutAzureEventHub
- PutCassandraQL
- PutCouchbaseKey
- PutDistributedMapCache
- PutDynamoDB
- PutElasticsearch
- PutEmail
- PutFile
- PutFTP
- PutHBaseCell
- PutHBaseJSON
- PutHDFS
- PutHiveQL
- PutHTMLElement
- PutJMS
- PutKafka
- PutKinesisFirehose
- PutLambda
- PutMongo
- PutRiemann
- PutS3Object
- PutSFTP
- PutSlack
- PutSNS
- PutSolrContentStream
- PutSplunk
- PutSQL
- PutSQS
- PutSyslog
- PutTCP
- PutUDP
- QueryCassandra
- QueryDatabaseTable
- ReplaceText
- ReplaceTextWithMapping
- ResizeImage
- RouteHL7
- RouteOnAttribute
- RouteOnContent
- RouteText
- ScanAttribute
- ScanContent
- SegmentContent
- SelectHiveQL
- SetSNMP
- SplitAvro
- SplitContent
- SplitJson
- SplitText
- SplitXml
- SpringContextProcessor
- StoreInKiteDataset
- TailFile
- TransformXml
- UnpackContent
- UpdateAttribute
- ValidateXml
- YandexTranslate
AttributesToJSON
Generates a JSON representation of the input FlowFile Attributes. The resulting JSON can be written to either a new Attribute ‘JSONAttributes’ or written to the FlowFile as content.
Base64EncodeContent
Encodes or decodes content to and from base64
CompressContent
Compresses or decompresses the contents of FlowFiles using a user-specified compression algorithm and updates the mime.type attribute as appropriate
ConsumeAMQP
Consumes AMQP Message transforming its content to a FlowFile and transitioning it to ‘success’ relationship
ConsumeJMS
Consumes JMS Message of type BytesMessage or TextMessage transforming its content to a FlowFile and transitioning it to ‘success’ relationship. JMS attributes such as headers and properties will be copied as FlowFile attributes.
ConsumeKafka
Consumes messages from Apache Kafka,specifically built against the Kafka 0.9.x Consumer API. The complementary NiFi processor for sending messages is PublishKafka.
ConsumeMQTT
Subscribes to a topic and receives messages from an MQTT broker
ControlRate
Controls the rate at which data is transferred to follow-on processors. If you configure a very small Time Duration, then the accuracy of the throttle gets worse. You can improve this accuracy by decreasing the Yield Duration, at the expense of more Tasks given to the processor.
ConvertAvroSchema
Convert records from one Avro schema to another, including support for flattening and simple type conversions
ConvertAvroToJSON
Converts a Binary Avro record into a JSON object. This processor provides a direct mapping of an Avro field to a JSON field, such that the resulting JSON will have the same hierarchical structure as the Avro document. Note that the Avro schema information will be lost, as this is not a translation from binary Avro to JSON formatted Avro. The output JSON is encoded the UTF-8 encoding. If an incoming FlowFile contains a stream of multiple Avro records, the resultant FlowFile will contain a JSON Array containing all of the Avro records or a sequence of JSON Objects. If an incoming FlowFile does not contain any records, an empty JSON object is the output. Empty/Single Avro record FlowFile inputs are optionally wrapped in a container as dictated by ‘Wrap Single Record’
ConvertCharacterSet
Converts a FlowFile’s content from one character set to another
ConvertCSVToAvro
Converts CSV files to Avro according to an Avro Schema
ConvertJSONToAvro
Converts JSON files to Avro according to an Avro Schema
ConvertJSONToSQL
Converts a JSON-formatted FlowFile into an UPDATE or INSERT SQL statement. The incoming FlowFile is expected to be “flat” JSON message, meaning that it consists of a single JSON element and each field maps to a simple type. If a field maps to a JSON object, that JSON object will be interpreted as Text. If the input is an array of JSON elements, each element in the array is output as a separate FlowFile to the ‘sql’ relationship. Upon successful conversion, the original FlowFile is routed to the ‘original’ relationship and the SQL is routed to the ‘sql’ relationship.
CreateHadoopSequenceFile
Creates Hadoop Sequence Files from incoming flow files
DebugFlow
The DebugFlow processor aids testing and debugging the FlowFile framework by allowing various responses to be explicitly triggered in response to the receipt of a FlowFile or a timer event without a FlowFile if using timer or cron based scheduling. It can force responses needed to exercise or test various failure modes that can occur when a processor runs.
DeleteDynamoDB
Deletes a document from DynamoDB based on hash and range key. The key can be string or number. The request requires all the primary keys for the operation (hash or hash and range key)
DeleteS3Object
Deletes FlowFiles on an Amazon S3 Bucket. If attempting to delete a file that does not exist, FlowFile is routed to success.
DeleteSQS
Deletes a message from an Amazon Simple Queuing Service Queue
DetectDuplicate
Caches a value, computed from FlowFile attributes, for each incoming FlowFile and determines if the cached value has already been seen. If so, routes the FlowFile to ‘duplicate’ with an attribute named ‘original.identifier’ that specifies the original FlowFile’s “description”, which is specified in the
DistributeLoad
Distributes FlowFiles to downstream processors based on a Distribution Strategy. If using the Round Robin strategy, the default is to assign each destination a weighting of 1 (evenly distributed). However, optional propertiescan be added to the change this; adding a property with the name ‘5’ and value ‘10’ means that the relationship with name ‘5’ will be receive 10 FlowFiles in each iteration instead of 1.
DuplicateFlowFile
Intended for load testing, this processor will create the configured number of copies of each incoming FlowFile
EncryptContent
Encrypts or Decrypts a FlowFile using either symmetric encryption with a password and randomly generated salt, or asymmetric encryption using a public and secret key.
EvaluateJsonPath
Evaluates one or more JsonPath expressions against the content of a FlowFile. The results of those expressions are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. JsonPaths are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is flowfile-attribute; otherwise, the property name is ignored). The value of the property must be a valid JsonPath expression. A Return Type of ‘auto-detect’ will make a determination based off the configured destination. When ‘Destination’ is set to ‘flowfile-attribute,’ a return type of ‘scalar’ will be used. When ‘Destination’ is set to ‘flowfile-content,’ a return type of ‘JSON’ will be used.If the JsonPath evaluates to a JSON array or JSON object and the Return Type is set to ‘scalar’ the FlowFile will be unmodified and will be routed to failure. A Return Type of JSON can return scalar values if the provided JsonPath evaluates to the specified value and will be routed as a match.If Destination is ‘flowfile-content’ and the JsonPath does not evaluate to a defined path, the FlowFile will be routed to ‘unmatched’ without having its contents modified. If Destination is flowfile-attribute and the expression matches nothing, attributes will be created with empty strings as the value, and the FlowFile will always be routed to ‘matched.’
EvaluateRegularExpression
WARNING: This has been deprecated and will be removed in 0.2.0.
Use ExtractText instead.
EvaluateXPath
Evaluates one or more XPaths against the content of a FlowFile. The results of those XPaths are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. XPaths are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is flowfile-attribute; otherwise, the property name is ignored). The value of the property must be a valid XPath expression. If the XPath evaluates to more than one node and the Return Type is set to ‘nodeset’ (either directly, or via ‘auto-detect’ with a Destination of ‘flowfile-content’), the FlowFile will be unmodified and will be routed to failure. If the XPath does not evaluate to a Node, the FlowFile will be routed to ‘unmatched’ without having its contents modified. If Destination is flowfile-attribute and the expression matches nothing, attributes will be created with empty strings as the value, and the FlowFile will always be routed to ‘matched’
EvaluateXQuery
Evaluates one or more XQueries against the content of a FlowFile. The results of those XQueries are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. XQueries are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is ‘flowfile-attribute’; otherwise, the property name is ignored). The value of the property must be a valid XQuery. If the XQuery returns more than one result, new attributes or FlowFiles (for Destinations of ‘flowfile-attribute’ or ‘flowfile-content’ respectively) will be created for each result (attributes will have a ‘.n’ one-up number appended to the specified attribute name). If any provided XQuery returns a result, the FlowFile(s) will be routed to ‘matched’. If no provided XQuery returns a result, the FlowFile will be routed to ‘unmatched’. If the Destination is ‘flowfile-attribute’ and the XQueries matche nothing, no attributes will be applied to the FlowFile.
ExecuteFlumeSink
Execute a Flume sink. Each input FlowFile is converted into a Flume Event for processing by the sink.
ExecuteFlumeSource
Execute a Flume source. Each Flume Event is sent to the success relationship as a FlowFile
ExecuteProcess
Runs an operating system command specified by the user and writes the output of that command to a FlowFile. If the command is expected to be long-running, the Processor can output the partial data on a specified interval. When this option is used, the output is expected to be in textual format, as it typically does not make sense to split binary data on arbitrary time-based intervals.
ExecuteScript
Experimental - Executes a script given the flow file and a process session. The script is responsible for handling the incoming flow file (transfer to SUCCESS or remove, e.g.) as well as any flow files created by the script. If the handling is incomplete or incorrect, the session will be rolled back. Experimental: Impact of sustained usage not yet verified.
ExecuteSQL
Execute provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘executesql.row.count’ indicates how many rows were selected.
ExecuteStreamCommand
Executes an external command on the contents of a flow file, and creates a new flow file with the results of the command.
ExtractAvroMetadata
Extracts metadata from the header of an Avro datafile.
ExtractHL7Attributes
Extracts information from an HL7 (Health Level 7) formatted FlowFile and adds the information as FlowFile Attributes. The attributes are named as
ExtractImageMetadata
Extract the image metadata from flowfiles containing images. This processor relies on this metadata extractor library https://github.com/drewnoakes/metadata-extractor. It extracts a long list of metadata types including but not limited to EXIF, IPTC, XMP and Photoshop fields. For the full list visit the library’s website.NOTE: The library being used loads the images into memory so extremely large images may cause problems.
ExtractMediaMetadata
Extract the content metadata from flowfiles containing audio, video, image, and other file types. This processor relies on the Apache Tika project for file format detection and parsing. It extracts a long list of metadata types for media files including audio, video, and print media formats.NOTE: the attribute names and content extracted may vary across upgrades because parsing is performed by the external Tika tools which in turn depend on other projects for metadata extraction. For the more details and the list of supported file types, visit the library’s website at http://tika.apache.org/.
ExtractText
Evaluates one or more Regular Expressions against the content of a FlowFile. The results of those Regular Expressions are assigned to FlowFile Attributes. Regular Expressions are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed. The first capture group, if any found, will be placed into that attribute name.But all capture groups, including the matching string sequence itself will also be provided at that attribute name with an index value provided, with the exception of a capturing group that is optional and does not match - for example, given the attribute name “regex” and expression “abc(def)?(g)” we would add an attribute “regex.1” with a value of “def” if the “def” matched. If the “def” did not match, no attribute named “regex.1” would be added but an attribute named “regex.2” with a value of “g” will be added regardless.The value of the property must be a valid Regular Expressions with one or more capturing groups. If the Regular Expression matches more than once, only the first match will be used. If any provided Regular Expression matches, the FlowFile(s) will be routed to ‘matched’. If no provided Regular Expression matches, the FlowFile will be routed to ‘unmatched’ and no attributes will be applied to the FlowFile.
FetchDistributedMapCache
Computes a cache key from FlowFile attributes, for each incoming FlowFile, and fetches the value from the Distributed Map Cache associated with that key. The incoming FlowFile’s content is replaced with the binary data received by the Distributed Map Cache. If there is no value stored under that key then the flow file will be routed to ‘not-found’. Note that the processor will always attempt to read the entire cached value into memory before placing it in it’s destination. This could be potentially problematic if the cached value is very large.
FetchElasticsearch
Retrieves a document from Elasticsearch using the specified connection properties and the identifier of the document to retrieve. If the cluster has been configured for authorization and/or secure transport (SSL/TLS) and the Shield plugin is available, secure connections can be made. This processor supports Elasticsearch 2.x clusters.
FetchFile
Reads the contents of a file from disk and streams it into the contents of an incoming FlowFile. Once this is done, the file is optionally moved elsewhere or deleted to help keep the file system organized.
FetchHDFS
Retrieves a file from HDFS. The content of the incoming FlowFile is replaced by the content of the file in HDFS. The file in HDFS is left intact without any changes being made to it.
FetchS3Object
Retrieves the contents of an S3 Object and writes it to the content of a FlowFile
FetchSFTP
Fetches the content of a file from a remote SFTP server and overwrites the contents of an incoming FlowFile with the content of the remote file.
GenerateFlowFile
This processor creates FlowFiles of random data and is used for load testing
GeoEnrichIP
Looks up geolocation information for an IP address and adds the geo information to FlowFile attributes. The geo data is provided as a MaxMind database. The attribute that contains the IP address to lookup is provided by the ‘IP Address Attribute’ property. If the name of the attribute provided is ‘X’, then the the attributes added by enrichment will take the form X.geo.
GetAzureEventHub
Receives messages from a Microsoft Azure Event Hub, writing the contents of the Azure message to the content of the FlowFile
GetCouchbaseKey
Get a document from Couchbase Server via Key/Value access. The ID of the document to fetch may be supplied by setting the
GetDynamoDB
Retrieves a document from DynamoDB based on hash and range key. The key can be string or number.For any get request all the primary keys are required (hash or hash and range based on the table keys).A Json Document (‘Map’) attribute of the DynamoDB item is read into the content of the FlowFile.
GetFile
Creates FlowFiles from files in a directory. NiFi will ignore files it doesn’t have at least read permissions for.
GetFTP
Fetches files from an FTP Server and creates FlowFiles from them
GetHBase
This Processor polls HBase for any records in the specified table. The processor keeps track of the timestamp of the cells that it receives, so that as new records are pushed to HBase, they will automatically be pulled. Each record is output in JSON format, as {“row”: “
GetHDFS
Fetch files from Hadoop Distributed File System (HDFS) into FlowFiles. This Processor will delete the file from HDFS after fetching it.
GetHDFSEvents
This processor polls the notification events provided by the HdfsAdmin API. Since this uses the HdfsAdmin APIs it is required to run as an HDFS super user. Currently there are six types of events (append, close, create, metadata, rename, and unlink). Please see org.apache.hadoop.hdfs.inotify.Event documentation for full explanations of each event. This processor will poll for new events based on a defined duration. For each event received a new flow file will be created with the expected attributes and the event itself serialized to JSON and written to the flow file’s content. For example, if event.type is APPEND then the content of the flow file will contain a JSON file containing the information about the append event. If successful the flow files are sent to the ‘success’ relationship. Be careful of where the generated flow files are stored. If the flow files are stored in one of processor’s watch directories there will be a never ending flow of events. It is also important to be aware that this processor must consume all events. The filtering must happen within the processor. This is because the HDFS admin’s event notifications API does not have filtering.
GetHDFSSequenceFile
Fetch sequence files from Hadoop Distributed File System (HDFS) into FlowFiles
GetHTMLElement
Extracts HTML element values from the incoming flowfile’s content using a CSS selector. The incoming HTML is first converted into a HTML Document Object Model so that HTML elements may be selected in the similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then “queried” using the user defined CSS selector string. The result of “querying” the HTML DOM may produce 0-N results. If no results are found the flowfile will be transferred to the “element not found” relationship to indicate so to the end user. If N results are found a new flowfile will be created and emitted for each result. The query result will either be placed in the content of the new flowfile or as an attribute of the new flowfile. By default the result is written to an attribute. This can be controlled by the “Destination” property. Resulting query values may also have data prepended or appended to them by setting the value of property “Prepend Element Value” or “Append Element Value”. Prepended and appended values are treated as string values and concatenated to the result retrieved from the HTML DOM query operation. A more thorough reference for the CSS selector syntax can be found at “http://jsoup.org/apidocs/org/jsoup/select/Selector.html”
GetHTTP
Fetches data from an HTTP or HTTPS URL and writes the data to the content of a FlowFile. Once the content has been fetched, the ETag and Last Modified dates are remembered (if the web server supports these concepts). This allows the Processor to fetch new data only if the remote data has changed or until the state is cleared. That is, once the content has been fetched from the given URL, it will not be fetched again until the content on the remote server changes. Note that due to limitations on state management, stored “last modified” and etag fields never expire. If the URL in GetHttp uses Expression Language that is unbounded, there is the potential for Out of Memory Errors to occur.
GetJMSQueue
Pulls messages from a JMS Queue, creating a FlowFile for each JMS Message or bundle of messages, as configured
GetJMSTopic
Pulls messages from a JMS Topic, creating a FlowFile for each JMS Message or bundle of messages, as configured
GetKafka
Fetches messages from Apache Kafka, specifically for 0.8.x versions. The complementary NiFi processor for sending messages is PutKafka.
GetMongo
Creates FlowFiles from documents in MongoDB
GetSFTP
Fetches files from an SFTP Server and creates FlowFiles from them
GetSNMP
Retrieves information from SNMP Agent and outputs a FlowFile with information in attributes and without any content
GetSolr
Queries Solr and outputs the results as a FlowFile
GetSplunk
Retrieves data from Splunk Enterprise.
GetSQS
Fetches messages from an Amazon Simple Queuing Service Queue
GetTwitter
Pulls status changes from Twitter’s streaming API
HandleHttpRequest
Starts an HTTP Server and listens for HTTP Requests. For each request, creates a FlowFile and transfers to ‘success’. This Processor is designed to be used in conjunction with the HandleHttpResponse Processor in order to create a Web Service
HandleHttpResponse
Sends an HTTP Response to the Requestor that generated a FlowFile. This Processor is designed to be used in conjunction with the HandleHttpRequest in order to create a web service.
HashAttribute
Hashes together the key/value pairs of several FlowFile Attributes and adds the hash as a new attribute. Optional properties are to be added such that the name of the property is the name of a FlowFile Attribute to consider and the value of the property is a regular expression that, if matched by the attribute value, will cause that attribute to be used as part of the hash. If the regular expression contains a capturing group, only the value of the capturing group will be used.
HashContent
Calculates a hash value for the Content of a FlowFile and puts that hash value on the FlowFile as an attribute whose name is determined by the
IdentifyMimeType
Attempts to identify the MIME Type used for a FlowFile. If the MIME Type can be identified, an attribute with the name ‘mime.type’ is added with the value being the MIME Type. If the MIME Type cannot be determined, the value will be set to ‘application/octet-stream’. In addition, the attribute mime.extension will be set if a common file extension for the MIME Type is known.
InferAvroSchema
Examines the contents of the incoming FlowFile to infer an Avro schema. The processor will use the Kite SDK to make an attempt to automatically generate an Avro schema from the incoming content. When inferring the schema from JSON data the key names will be used in the resulting Avro schema definition. When inferring from CSV data a “header definition” must be present either as the first line of the incoming data or the “header definition” must be explicitly set in the property “CSV Header Definition”. A “header definition” is simply a single comma separated line defining the names of each column. The “header definition” is required in order to determine the names that should be given to each field in the resulting Avro definition. When inferring data types the higher order data type is always used if there is ambiguity. For example when examining numerical values the type may be set to “long” instead of “integer” since a long can safely hold the value of any “integer”. Only CSV and JSON content is currently supported for automatically inferring an Avro schema. The type of content present in the incoming FlowFile is set by using the property “Input Content Type”. The property can either be explicitly set to CSV, JSON, or “use mime.type value” which will examine the value of the mime.type attribute on the incoming FlowFile to determine the type of content present.
InvokeHTTP
An HTTP client processor which can interact with a configurable HTTP Endpoint. The destination URL and HTTP Method are configurable. FlowFile attributes are converted to HTTP headers and the FlowFile contents are included as the body of the request (if the HTTP Method is PUT or POST).
InvokeScriptedProcessor
Experimental - Invokes a script engine for a Processor defined in the given script. The script must define a valid class that implements the Processor interface, and it must set a variable ‘processor’ to an instance of the class. Processor methods such as onTrigger() will be delegated to the scripted Processor instance. Also any Relationships or PropertyDescriptors defined by the scripted processor will be added to the configuration dialog. Experimental: Impact of sustained usage not yet verified.
JoltTransformJSON
Applies a list of Jolt specifications to the flowfile JSON payload. A new FlowFile is created with transformed content and is routed to the ‘success’ relationship. If the JSON transform fails, the original FlowFile is routed to the ‘failure’ relationship.
ListenHTTP
Starts an HTTP Server that is used to receive FlowFiles from remote sources. The default URI of the Service will be http://{hostname}:{port}/contentListener
ListenLumberjack
Listens for Lumberjack messages being sent to a given port over TCP. Each message will be acknowledged after successfully writing the message to a FlowFile. Each FlowFile will contain data portion of one or more Lumberjack frames. In the case where the Lumberjack frames contain syslog messages, the output of this processor can be sent to a ParseSyslog processor for further processing.
ListenRELP
Listens for RELP messages being sent to a given port over TCP. Each message will be acknowledged after successfully writing the message to a FlowFile. Each FlowFile will contain data portion of one or more RELP frames. In the case where the RELP frames contain syslog messages, the output of this processor can be sent to a ParseSyslog processor for further processing.
ListenSyslog
Listens for Syslog messages being sent to a given port over TCP or UDP. Incoming messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The format of each message is: (
ListenTCP
Listens for incoming TCP connections and reads data from each connection using a line separator as the message demarcator. The default behavior is for each message to produce a single FlowFile, however this can be controlled by increasing the Batch Size to a larger value for higher throughput. The Receive Buffer Size must be set as large as the largest messages expected to be received, meaning if every 100kb there is a line separator, then the Receive Buffer Size must be greater than 100kb.
ListenUDP
Listens for Datagram Packets on a given port. The default behavior produces a FlowFile per datagram, however for higher throughput the Max Batch Size property may be increased to specify the number of datagrams to batch together in a single FlowFile. This processor can be restricted to listening for datagrams from a specific remote host and port by specifying the Sending Host and Sending Host Port properties, otherwise it will listen for datagrams from all hosts and ports.
ListFile
Retrieves a listing of files from the local filesystem. For each file that is listed, creates a FlowFile that represents the file so that it can be fetched in conjunction with FetchFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Unlike GetFile, this Processor does not delete any data from the local filesystem.
ListHDFS
Retrieves a listing of files from HDFS. For each file that is listed in HDFS, creates a FlowFile that represents the HDFS file so that it can be fetched in conjunction with ListHDFS. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Unlike GetHDFS, this Processor does not delete any data from HDFS.
ListS3
Retrieves a listing of objects from an S3 bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchS3Object. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.
ListSFTP
Performs a listing of the files residing on an SFTP server. For each file that is found on the remote server, a new FlowFile will be created with the filename attribute set to the name of the file on the remote server. This can then be used in conjunction with FetchSFTP in order to fetch those files.
LogAttribute
No description provided.
MergeContent
Merges a Group of FlowFiles together based on a user-defined strategy and packages them into a single FlowFile. It is recommended that the Processor be configured with only a single incoming connection, as Group of FlowFiles will not be created from FlowFiles in different connections. This processor updates the mime.type attribute as appropriate.
ModifyBytes
Discard byte range at the start and end or all content of a binary file.
ModifyHTMLElement
Modifies the value of an existing HTML element. The desired element to be modified is located by using CSS selector syntax. The incoming HTML is first converted into a HTML Document Object Model so that HTML elements may be selected in the similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then “queried” using the user defined CSS selector string to find the element the user desires to modify. If the HTML element is found the element’s value is updated in the DOM using the value specified “Modified Value” property. All DOM elements that match the CSS selector will be updated. Once all of the DOM elements have been updated the DOM is rendered to HTML and the result replaces the flowfile content with the updated HTML. A more thorough reference for the CSS selector syntax can be found at “http://jsoup.org/apidocs/org/jsoup/select/Selector.html”
MonitorActivity
Monitors the flow for activity and sends out an indicator when the flow has not had any data for some specified amount of time and again when the flow’s activity is restored
ParseSyslog
Parses the contents of a Syslog message and adds attributes to the FlowFile for each of the parts of the Syslog message
PostHTTP
Performs an HTTP Post with the content of the FlowFile
PublishAMQP
Creates a AMQP Message from the contents of a FlowFile and sends the message to an AMQP Exchange.In a typical AMQP exchange model, the message that is sent to the AMQP Exchange will be routed based on the ‘Routing Key’ to its final destination in the queue (the binding). If due to some misconfiguration the binding between the Exchange, Routing Key and Queue is not set up, the message will have no final destination and will return (i.e., the data will not make it to the queue). If that happens you will see a log in both app-log and bulletin stating to that effect. Fixing the binding (normally done by AMQP administrator) will resolve the issue.
PublishJMS
Creates a JMS Message from the contents of a FlowFile and sends it to a JMS Destination (queue or topic) as JMS BytesMessage. FlowFile attributes will be added as JMS headers and/or properties to the outgoing JMS message.
PublishKafka
Sends the contents of a FlowFile as a message to Apache Kafka, using the Kafka 0.9.x Producer. The messages to send may be individual FlowFiles or may be delimited, using a user-specified delimiter, such as a new-line. The complementary NiFi processor for fetching messages is ConsumeKafka.
PublishMQTT
Publishes a message to an MQTT topic
PutAzureEventHub
Sends the contents of a FlowFile to a Windows Azure Event Hub. Note: the content of the FlowFile will be buffered into memory before being sent, so care should be taken to avoid sending FlowFiles to this Processor that exceed the amount of Java Heap Space available.
PutCassandraQL
Execute provided Cassandra Query Language (CQL) statement on a Cassandra 1.x, 2.x, or 3.0.x cluster. The content of an incoming FlowFile is expected to be the CQL command to execute. The CQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention cql.args.N.type and cql.args.N.value, where N is a positive integer. The cql.args.N.type is expected to be a lowercase string indicating the Cassandra type.
PutCouchbaseKey
Put a document to Couchbase Server via Key/Value access.
PutDistributedMapCache
Gets the content of a FlowFile and puts it to a distributed map cache, using a cache key computed from FlowFile attributes. If the cache already contains the entry and the cache update strategy is ‘keep original’ the entry is not replaced.’
PutDynamoDB
Puts a document from DynamoDB based on hash and range key. The table can have either hash and range or hash key alone. Currently the keys supported are string and number and value can be json document. In case of hash and range keys both key are required for the operation. The FlowFile content must be JSON. FlowFile content is mapped to the specified Json Document attribute in the DynamoDB item.
PutElasticsearch
Writes the contents of a FlowFile to Elasticsearch, using the specified parameters such as the index to insert into and the type of the document. If the cluster has been configured for authorization and/or secure transport (SSL/TLS) and the Shield plugin is available, secure connections can be made. This processor supports Elasticsearch 2.x clusters.
PutEmail
Sends an e-mail to configured recipients for each incoming FlowFile
PutFile
Writes the contents of a FlowFile to the local file system
PutFTP
Sends FlowFiles to an FTP Server
PutHBaseCell
Adds the Contents of a FlowFile to HBase as the value of a single cell
PutHBaseJSON
Adds rows to HBase based on the contents of incoming JSON documents. Each FlowFile must contain a single UTF-8 encoded JSON document, and any FlowFiles where the root element is not a single document will be routed to failure. Each JSON field name and value will become a column qualifier and value of the HBase row. Any fields with a null value will be skipped, and fields with a complex value will be handled according to the Complex Field Strategy. The row id can be specified either directly on the processor through the Row Identifier property, or can be extracted from the JSON document by specifying the Row Identifier Field Name property. This processor will hold the contents of all FlowFiles for the given batch in memory at one time.
PutHDFS
Write FlowFile data to Hadoop Distributed File System (HDFS)
PutHiveQL
Executes a HiveQL DDL/DML command (UPDATE, INSERT, e.g.). The content of an incoming FlowFile is expected to be the HiveQL command to execute. The HiveQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention hiveql.args.N.type and hiveql.args.N.value, where N is a positive integer. The hiveql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.
PutHTMLElement
Places a new HTML element in the existing HTML DOM. The desired position for the new HTML element is specified by using CSS selector syntax. The incoming HTML is first converted into a HTML Document Object Model so that HTML DOM location may be located in a similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then “queried” using the user defined CSS selector string to find the position where the user desires to add the new HTML element. Once the new HTML element is added to the DOM it is rendered to HTML and the result replaces the flowfile content with the updated HTML. A more thorough reference for the CSS selector syntax can be found at “http://jsoup.org/apidocs/org/jsoup/select/Selector.html”
PutJMS
Creates a JMS Message from the contents of a FlowFile and sends the message to a JMS Server
PutKafka
Sends the contents of a FlowFile as a message to Apache Kafka, specifically for 0.8.x versions. The messages to send may be individual FlowFiles or may be delimited, using a user-specified delimiter, such as a new-line. The complementary NiFi processor for fetching messages is GetKafka.
PutKinesisFirehose
Sends the contents to a specified Amazon Kinesis Firehose. In order to send data to firehose, the firehose delivery stream name has to be specified.
PutLambda
Sends the contents to a specified Amazon Lamba Function. The AWS credentials used for authentication must have permissions execute the Lambda function (lambda:InvokeFunction).The FlowFile content must be JSON.
PutMongo
Writes the contents of a FlowFile to MongoDB
PutRiemann
Send events to Riemann (http://riemann.io) when FlowFiles pass through this processor. You can use events to notify Riemann that a FlowFile passed through, or you can attach a more meaningful metric, such as, the time a FlowFile took to get to this processor. All attributes attached to events support the NiFi Expression Language.
PutS3Object
Puts FlowFiles to an Amazon S3 Bucket The upload uses either the PutS3Object method or PutS3MultipartUpload methods. The PutS3Object method send the file in a single synchronous call, but it has a 5GB size limit. Larger files are sent using the multipart upload methods that initiate, transfer the parts, and complete an upload. This multipart process saves state after each step so that a large upload can be resumed with minimal loss if the processor or cluster is stopped and restarted. A multipart upload consists of three steps 1) initiate upload, 2) upload the parts, and 3) complete the upload. For multipart uploads, the processor saves state locally tracking the upload ID and parts uploaded, which must both be provided to complete the upload. The AWS libraries select an endpoint URL based on the AWS region, but this can be overridden with the ‘Endpoint Override URL’ property for use with other S3-compatible endpoints. The S3 API specifies that the maximum file size for a PutS3Object upload is 5GB. It also requires that parts in a multipart upload must be at least 5MB in size, except for the last part. These limits are establish the bounds for the Multipart Upload Threshold and Part Size properties.
PutSFTP
Sends FlowFiles to an SFTP Server
PutSlack
Sends a message to your team on slack.com
PutSNS
Sends the content of a FlowFile as a notification to the Amazon Simple Notification Service
PutSolrContentStream
Sends the contents of a FlowFile as a ContentStream to Solr
PutSplunk
Sends logs to Splunk Enterprise over TCP, TCP + TLS/SSL, or UDP. If a Message Delimiter is provided, then this processor will read messages from the incoming FlowFile based on the delimiter, and send each message to Splunk. If a Message Delimiter is not provided then the content of the FlowFile will be sent directly to Splunk as if it were a single message.
PutSQL
Executes a SQL UPDATE or INSERT command. The content of an incoming FlowFile is expected to be the SQL command to execute. The SQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.
PutSQS
Publishes a message to an Amazon Simple Queuing Service Queue
PutSyslog
Sends Syslog messages to a given host and port over TCP or UDP. Messages are constructed from the “Message ___” properties of the processor which can use expression language to generate messages from incoming FlowFiles. The properties are used to construct messages of the form: (
PutTCP
The PutTCP processor receives a FlowFile and transmits the FlowFile content over a TCP connection to the configured TCP server. By default, the FlowFiles are transmitted over the same TCP connection (or pool of TCP connections if multiple input threads are configured). To assist the TCP server with determining message boundaries, an optional “Outgoing Message Delimiter” string can be configured which is appended to the end of each FlowFiles content when it is transmitted over the TCP connection. An optional “Connection Per FlowFile” parameter can be specified to change the behaviour so that each FlowFiles content is transmitted over a single TCP connection which is opened when the FlowFile is received and closed after the FlowFile has been sent. This option should only be used for low message volume scenarios, otherwise the platform may run out of TCP sockets.
PutUDP
The PutUDP processor receives a FlowFile and packages the FlowFile content into a single UDP datagram packet which is then transmitted to the configured UDP server. The user must ensure that the FlowFile content being fed to this processor is not larger than the maximum size for the underlying UDP transport. The maximum transport size will vary based on the platform setup but is generally just under 64KB. FlowFiles will be marked as failed if their content is larger than the maximum transport size.
QueryCassandra
Execute provided Cassandra Query Language (CQL) select query on a Cassandra 1.x, 2.x, or 3.0.x cluster. Query result may be converted to Avro or JSON format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘executecql.row.count’ indicates how many rows were selected.
QueryDatabaseTable
Execute provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘querydbtable.row.count’ indicates how many rows were selected.
ReplaceText
Updates the content of a FlowFile by evaluating a Regular Expression (regex) against it and replacing the section of the content that matches the Regular Expression with some alternate value.
ReplaceTextWithMapping
Updates the content of a FlowFile by evaluating a Regular Expression against it and replacing the section of the content that matches the Regular Expression with some alternate value provided in a mapping file.
ResizeImage
Resizes an image to user-specified dimensions. This Processor uses the image codecs registered with the environment that NiFi is running in. By default, this includes JPEG, PNG, BMP, WBMP, and GIF images.
RouteHL7
Routes incoming HL7 data according to user-defined queries. To add a query, add a new property to the processor. The name of the property will become a new relationship for the processor, and the value is an HL7 Query Language query. If a FlowFile matches the query, a copy of the FlowFile will be routed to the associated relationship.
RouteOnAttribute
Routes FlowFiles based on their Attributes using the Attribute Expression Language
RouteOnContent
Applies Regular Expressions to the content of a FlowFile and routes a copy of the FlowFile to each destination whose Regular Expression matches. Regular Expressions are added as User-Defined Properties where the name of the property is the name of the relationship and the value is a Regular Expression to match against the FlowFile content. User-Defined properties do support the Attribute Expression Language, but the results are interpreted as literal values, not Regular Expressions
RouteText
Routes textual data based on a set of user-defined rules. Each line in an incoming FlowFile is compared against the values specified by user-defined Properties. The mechanism by which the text is compared to these user-defined properties is defined by the ‘Matching Strategy’. The data is then routed according to these rules, routing each line of the text individually.
ScanAttribute
Scans the specified attributes of FlowFiles, checking to see if any of their values are present within the specified dictionary of terms
ScanContent
Scans the content of FlowFiles for terms that are found in a user-supplied dictionary. If a term is matched, the UTF-8 encoded version of the term will be added to the FlowFile using the ‘matching.term’ attribute
SegmentContent
Segments a FlowFile into multiple smaller segments on byte boundaries. Each segment is given the following attributes: fragment.identifier, fragment.index, fragment.count, segment.original.filename; these attributes can then be used by the MergeContent processor in order to reconstitute the original FlowFile
SelectHiveQL
Execute provided HiveQL SELECT query against a Hive database connection. Query result will be converted to Avro or CSV format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘selecthiveql.row.count’ indicates how many rows were selected.
SetSNMP
Based on incoming FlowFile attributes, the processor will execute SNMP Set requests. When founding attributes with name like snmp$
SplitAvro
Splits a binary encoded Avro datafile into smaller files based on the configured Output Size. The Output Strategy determines if the smaller files will be Avro datafiles, or bare Avro records with metadata in the FlowFile attributes. The output will always be binary encoded.
SplitContent
Splits incoming FlowFiles by a specified byte sequence
SplitJson
Splits a JSON File into multiple, separate FlowFiles for an array element specified by a JsonPath expression. Each generated FlowFile is comprised of an element of the specified array and transferred to relationship ‘split,’ with the original file transferred to the ‘original’ relationship. If the specified JsonPath is not found or does not evaluate to an array element, the original file is routed to ‘failure’ and no files are generated.
SplitText
Splits a text file into multiple smaller text files on line boundaries limited by maximum number of lines or total size of fragment. Each output split file will contain no more than the configured number of lines or bytes. If both Line Split Count and Maximum Fragment Size are specified, the split occurs at whichever limit is reached first. If the first line of a fragment exceeds the Maximum Fragment Size, that line will be output in a single split file which exceeds the configured maximum size limit.
SplitXml
Splits an XML File into multiple separate FlowFiles, each comprising a child or descendant of the original root element
SpringContextProcessor
A Processor that supports sending and receiving data from application defined in Spring Application Context via predefined in/out MessageChannels.
StoreInKiteDataset
Stores Avro records in a Kite dataset
TailFile
“Tails” a file, ingesting data from the file as it is written to the file. The file is expected to be textual. Data is ingested only when a new line is encountered (carriage return or new-line character or combination). If the file to tail is periodically “rolled over”, as is generally the case with log files, an optional Rolling Filename Pattern can be used to retrieve data from files that have rolled over, even if the rollover occurred while NiFi was not running (provided that the data still exists upon restart of NiFi). It is generally advisable to set the Run Schedule to a few seconds, rather than running with the default value of 0 secs, as this Processor will consume a lot of resources if scheduled very aggressively. At this time, this Processor does not support ingesting files that have been compressed when ‘rolled over’.
TransformXml
Applies the provided XSLT file to the flowfile XML payload. A new FlowFile is created with transformed content and is routed to the ‘success’ relationship. If the XSL transform fails, the original FlowFile is routed to the ‘failure’ relationship
UnpackContent
Unpacks the content of FlowFiles that have been packaged with one of several different Packaging Formats, emitting one to many FlowFiles for each input FlowFile
UpdateAttribute
Updates the Attributes for a FlowFile by using the Attribute Expression Language and/or deletes the attributes based on a regular expression
ValidateXml
Validates the contents of FlowFiles against a user-specified XML Schema file
YandexTranslate
Translates content and attributes from one language to another