Apache Nifi Processors in version 0.7.4

Includes all processors through release

0.7.4

For other nifi versions, please reference our default processors post. Check the Apache nifi site for downloads or any nifi version or for current version docs

List of Processors

AttributesToJSON
Base64EncodeContent
CompressContent
ConsumeAMQP
ConsumeJMS
ConsumeKafka
ConsumeMQTT
ControlRate
ConvertAvroSchema
ConvertAvroToJSON
ConvertCharacterSet
ConvertCSVToAvro
ConvertJSONToAvro
ConvertJSONToSQL
CreateHadoopSequenceFile
DebugFlow
DeleteDynamoDB
DeleteS3Object
DeleteSQS
DetectDuplicate
DistributeLoad
DuplicateFlowFile
EncryptContent
EvaluateJsonPath
EvaluateRegularExpression
EvaluateXPath
EvaluateXQuery
ExecuteFlumeSink
ExecuteFlumeSource
ExecuteProcess
ExecuteScript
ExecuteSQL
ExecuteStreamCommand
ExtractAvroMetadata
ExtractHL7Attributes
ExtractImageMetadata
ExtractMediaMetadata
ExtractText
FetchDistributedMapCache
FetchElasticsearch
FetchFile
FetchHDFS
FetchS3Object
FetchSFTP
GenerateFlowFile
GeoEnrichIP
GetAzureEventHub
GetCouchbaseKey
GetDynamoDB
GetFile
GetFTP
GetHBase
GetHDFS
GetHDFSEvents
GetHDFSSequenceFile
GetHTMLElement
GetHTTP
GetJMSQueue
GetJMSTopic
GetKafka
GetMongo
GetSFTP
GetSNMP
GetSolr
GetSplunk
GetSQS
GetTwitter
HandleHttpRequest
HandleHttpResponse
HashAttribute
HashContent
IdentifyMimeType
InferAvroSchema
InvokeHTTP
InvokeScriptedProcessor
JoltTransformJSON
ListenHTTP
ListenLumberjack
ListenRELP
ListenSyslog
ListenTCP
ListenUDP
ListFile
ListHDFS
ListS3
ListSFTP
LogAttribute
MergeContent
ModifyBytes
ModifyHTMLElement
MonitorActivity
ParseSyslog
PostHTTP
PublishAMQP
PublishJMS
PublishKafka
PublishMQTT
PutAzureEventHub
PutCassandraQL
PutCouchbaseKey
PutDistributedMapCache
PutDynamoDB
PutElasticsearch
PutEmail
PutFile
PutFTP
PutHBaseCell
PutHBaseJSON
PutHDFS
PutHiveQL
PutHTMLElement
PutJMS
PutKafka
PutKinesisFirehose
PutLambda
PutMongo
PutRiemann
PutS3Object
PutSFTP
PutSlack
PutSNS
PutSolrContentStream
PutSplunk
PutSQL
PutSQS
PutSyslog
PutTCP
PutUDP
QueryCassandra
QueryDatabaseTable
ReplaceText
ReplaceTextWithMapping
ResizeImage
RouteHL7
RouteOnAttribute
RouteOnContent
RouteText
ScanAttribute
ScanContent
SegmentContent
SelectHiveQL
SetSNMP
SplitAvro
SplitContent
SplitJson
SplitText
SplitXml
SpringContextProcessor
StoreInKiteDataset
TailFile
TransformXml
UnpackContent
UpdateAttribute
ValidateXml
YandexTranslate

AttributesToJSON

Generates a JSON representation of the input FlowFile Attributes. The resulting JSON can be written to either a new Attribute ‘JSONAttributes’ or written to the FlowFile as content.

Base64EncodeContent

Encodes or decodes content to and from base64

CompressContent

Compresses or decompresses the contents of FlowFiles using a user-specified compression algorithm and updates the mime.type attribute as appropriate

ConsumeAMQP

Consumes AMQP Message transforming its content to a FlowFile and transitioning it to ‘success’ relationship

ConsumeJMS

Consumes JMS Message of type BytesMessage or TextMessage transforming its content to a FlowFile and transitioning it to ‘success’ relationship. JMS attributes such as headers and properties will be copied as FlowFile attributes.

ConsumeKafka

Consumes messages from Apache Kafka,specifically built against the Kafka 0.9.x Consumer API. The complementary NiFi processor for sending messages is PublishKafka.

ConsumeMQTT

Subscribes to a topic and receives messages from an MQTT broker

ControlRate

Controls the rate at which data is transferred to follow-on processors. If you configure a very small Time Duration, then the accuracy of the throttle gets worse. You can improve this accuracy by decreasing the Yield Duration, at the expense of more Tasks given to the processor.

ConvertAvroSchema

Convert records from one Avro schema to another, including support for flattening and simple type conversions

ConvertAvroToJSON

Converts a Binary Avro record into a JSON object. This processor provides a direct mapping of an Avro field to a JSON field, such that the resulting JSON will have the same hierarchical structure as the Avro document. Note that the Avro schema information will be lost, as this is not a translation from binary Avro to JSON formatted Avro. The output JSON is encoded the UTF-8 encoding. If an incoming FlowFile contains a stream of multiple Avro records, the resultant FlowFile will contain a JSON Array containing all of the Avro records or a sequence of JSON Objects. If an incoming FlowFile does not contain any records, an empty JSON object is the output. Empty/Single Avro record FlowFile inputs are optionally wrapped in a container as dictated by ‘Wrap Single Record’

ConvertCharacterSet

Converts a FlowFile’s content from one character set to another

ConvertCSVToAvro

Converts CSV files to Avro according to an Avro Schema

ConvertJSONToAvro

Converts JSON files to Avro according to an Avro Schema

ConvertJSONToSQL

Converts a JSON-formatted FlowFile into an UPDATE or INSERT SQL statement. The incoming FlowFile is expected to be “flat” JSON message, meaning that it consists of a single JSON element and each field maps to a simple type. If a field maps to a JSON object, that JSON object will be interpreted as Text. If the input is an array of JSON elements, each element in the array is output as a separate FlowFile to the ‘sql’ relationship. Upon successful conversion, the original FlowFile is routed to the ‘original’ relationship and the SQL is routed to the ‘sql’ relationship.

CreateHadoopSequenceFile

Creates Hadoop Sequence Files from incoming flow files

DebugFlow

The DebugFlow processor aids testing and debugging the FlowFile framework by allowing various responses to be explicitly triggered in response to the receipt of a FlowFile or a timer event without a FlowFile if using timer or cron based scheduling. It can force responses needed to exercise or test various failure modes that can occur when a processor runs.

DeleteDynamoDB

Deletes a document from DynamoDB based on hash and range key. The key can be string or number. The request requires all the primary keys for the operation (hash or hash and range key)

DeleteS3Object

Deletes FlowFiles on an Amazon S3 Bucket. If attempting to delete a file that does not exist, FlowFile is routed to success.

DeleteSQS

Deletes a message from an Amazon Simple Queuing Service Queue

DetectDuplicate

Caches a value, computed from FlowFile attributes, for each incoming FlowFile and determines if the cached value has already been seen. If so, routes the FlowFile to ‘duplicate’ with an attribute named ‘original.identifier’ that specifies the original FlowFile’s “description”, which is specified in the property. If the FlowFile is not determined to be a duplicate, the Processor routes the FlowFile to 'non-duplicate'

DistributeLoad

Distributes FlowFiles to downstream processors based on a Distribution Strategy. If using the Round Robin strategy, the default is to assign each destination a weighting of 1 (evenly distributed). However, optional propertiescan be added to the change this; adding a property with the name ‘5’ and value ‘10’ means that the relationship with name ‘5’ will be receive 10 FlowFiles in each iteration instead of 1.

DuplicateFlowFile

Intended for load testing, this processor will create the configured number of copies of each incoming FlowFile

EncryptContent

Encrypts or Decrypts a FlowFile using either symmetric encryption with a password and randomly generated salt, or asymmetric encryption using a public and secret key.

EvaluateJsonPath

Evaluates one or more JsonPath expressions against the content of a FlowFile. The results of those expressions are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. JsonPaths are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is flowfile-attribute; otherwise, the property name is ignored). The value of the property must be a valid JsonPath expression. A Return Type of ‘auto-detect’ will make a determination based off the configured destination. When ‘Destination’ is set to ‘flowfile-attribute,’ a return type of ‘scalar’ will be used. When ‘Destination’ is set to ‘flowfile-content,’ a return type of ‘JSON’ will be used.If the JsonPath evaluates to a JSON array or JSON object and the Return Type is set to ‘scalar’ the FlowFile will be unmodified and will be routed to failure. A Return Type of JSON can return scalar values if the provided JsonPath evaluates to the specified value and will be routed as a match.If Destination is ‘flowfile-content’ and the JsonPath does not evaluate to a defined path, the FlowFile will be routed to ‘unmatched’ without having its contents modified. If Destination is flowfile-attribute and the expression matches nothing, attributes will be created with empty strings as the value, and the FlowFile will always be routed to ‘matched.’

EvaluateRegularExpression

WARNING: This has been deprecated and will be removed in 0.2.0.

Use ExtractText instead.

EvaluateXPath

Evaluates one or more XPaths against the content of a FlowFile. The results of those XPaths are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. XPaths are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is flowfile-attribute; otherwise, the property name is ignored). The value of the property must be a valid XPath expression. If the XPath evaluates to more than one node and the Return Type is set to ‘nodeset’ (either directly, or via ‘auto-detect’ with a Destination of ‘flowfile-content’), the FlowFile will be unmodified and will be routed to failure. If the XPath does not evaluate to a Node, the FlowFile will be routed to ‘unmatched’ without having its contents modified. If Destination is flowfile-attribute and the expression matches nothing, attributes will be created with empty strings as the value, and the FlowFile will always be routed to ‘matched’

EvaluateXQuery

Evaluates one or more XQueries against the content of a FlowFile. The results of those XQueries are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. XQueries are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is ‘flowfile-attribute’; otherwise, the property name is ignored). The value of the property must be a valid XQuery. If the XQuery returns more than one result, new attributes or FlowFiles (for Destinations of ‘flowfile-attribute’ or ‘flowfile-content’ respectively) will be created for each result (attributes will have a ‘.n’ one-up number appended to the specified attribute name). If any provided XQuery returns a result, the FlowFile(s) will be routed to ‘matched’. If no provided XQuery returns a result, the FlowFile will be routed to ‘unmatched’. If the Destination is ‘flowfile-attribute’ and the XQueries matche nothing, no attributes will be applied to the FlowFile.

ExecuteFlumeSink

Execute a Flume sink. Each input FlowFile is converted into a Flume Event for processing by the sink.

ExecuteFlumeSource

Execute a Flume source. Each Flume Event is sent to the success relationship as a FlowFile

ExecuteProcess

Runs an operating system command specified by the user and writes the output of that command to a FlowFile. If the command is expected to be long-running, the Processor can output the partial data on a specified interval. When this option is used, the output is expected to be in textual format, as it typically does not make sense to split binary data on arbitrary time-based intervals.

ExecuteScript

Experimental - Executes a script given the flow file and a process session. The script is responsible for handling the incoming flow file (transfer to SUCCESS or remove, e.g.) as well as any flow files created by the script. If the handling is incomplete or incorrect, the session will be rolled back. Experimental: Impact of sustained usage not yet verified.

ExecuteSQL

Execute provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘executesql.row.count’ indicates how many rows were selected.

ExecuteStreamCommand

Executes an external command on the contents of a flow file, and creates a new flow file with the results of the command.

ExtractAvroMetadata

Extracts metadata from the header of an Avro datafile.

ExtractHL7Attributes

Extracts information from an HL7 (Health Level 7) formatted FlowFile and adds the information as FlowFile Attributes. The attributes are named as . If the segment is repeating, the naming will be . For example, we may have an attribute named "MHS.12" with a value of "2.1" and an attribute named "OBX_11.3" with a value of "93000^CPT4".

ExtractImageMetadata

Extract the image metadata from flowfiles containing images. This processor relies on this metadata extractor library https://github.com/drewnoakes/metadata-extractor. It extracts a long list of metadata types including but not limited to EXIF, IPTC, XMP and Photoshop fields. For the full list visit the library’s website.NOTE: The library being used loads the images into memory so extremely large images may cause problems.

ExtractMediaMetadata

Extract the content metadata from flowfiles containing audio, video, image, and other file types. This processor relies on the Apache Tika project for file format detection and parsing. It extracts a long list of metadata types for media files including audio, video, and print media formats.NOTE: the attribute names and content extracted may vary across upgrades because parsing is performed by the external Tika tools which in turn depend on other projects for metadata extraction. For the more details and the list of supported file types, visit the library’s website at http://tika.apache.org/.

ExtractText

Evaluates one or more Regular Expressions against the content of a FlowFile. The results of those Regular Expressions are assigned to FlowFile Attributes. Regular Expressions are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed. The first capture group, if any found, will be placed into that attribute name.But all capture groups, including the matching string sequence itself will also be provided at that attribute name with an index value provided, with the exception of a capturing group that is optional and does not match - for example, given the attribute name “regex” and expression “abc(def)?(g)” we would add an attribute “regex.1” with a value of “def” if the “def” matched. If the “def” did not match, no attribute named “regex.1” would be added but an attribute named “regex.2” with a value of “g” will be added regardless.The value of the property must be a valid Regular Expressions with one or more capturing groups. If the Regular Expression matches more than once, only the first match will be used. If any provided Regular Expression matches, the FlowFile(s) will be routed to ‘matched’. If no provided Regular Expression matches, the FlowFile will be routed to ‘unmatched’ and no attributes will be applied to the FlowFile.

FetchDistributedMapCache

Computes a cache key from FlowFile attributes, for each incoming FlowFile, and fetches the value from the Distributed Map Cache associated with that key. The incoming FlowFile’s content is replaced with the binary data received by the Distributed Map Cache. If there is no value stored under that key then the flow file will be routed to ‘not-found’. Note that the processor will always attempt to read the entire cached value into memory before placing it in it’s destination. This could be potentially problematic if the cached value is very large.

FetchElasticsearch

Retrieves a document from Elasticsearch using the specified connection properties and the identifier of the document to retrieve. If the cluster has been configured for authorization and/or secure transport (SSL/TLS) and the Shield plugin is available, secure connections can be made. This processor supports Elasticsearch 2.x clusters.

FetchFile

Reads the contents of a file from disk and streams it into the contents of an incoming FlowFile. Once this is done, the file is optionally moved elsewhere or deleted to help keep the file system organized.

FetchHDFS

Retrieves a file from HDFS. The content of the incoming FlowFile is replaced by the content of the file in HDFS. The file in HDFS is left intact without any changes being made to it.

FetchS3Object

Retrieves the contents of an S3 Object and writes it to the content of a FlowFile

FetchSFTP

Fetches the content of a file from a remote SFTP server and overwrites the contents of an incoming FlowFile with the content of the remote file.

GenerateFlowFile

This processor creates FlowFiles of random data and is used for load testing

GeoEnrichIP

Looks up geolocation information for an IP address and adds the geo information to FlowFile attributes. The geo data is provided as a MaxMind database. The attribute that contains the IP address to lookup is provided by the ‘IP Address Attribute’ property. If the name of the attribute provided is ‘X’, then the the attributes added by enrichment will take the form X.geo.

GetAzureEventHub

Receives messages from a Microsoft Azure Event Hub, writing the contents of the Azure message to the content of the FlowFile

GetCouchbaseKey

Get a document from Couchbase Server via Key/Value access. The ID of the document to fetch may be supplied by setting the property. NOTE: if the Document Id property is not set, the contents of the FlowFile will be read to determine the Document Id, which means that the contents of the entire FlowFile will be buffered in memory.

GetDynamoDB

Retrieves a document from DynamoDB based on hash and range key. The key can be string or number.For any get request all the primary keys are required (hash or hash and range based on the table keys).A Json Document (‘Map’) attribute of the DynamoDB item is read into the content of the FlowFile.

GetFile

Creates FlowFiles from files in a directory. NiFi will ignore files it doesn’t have at least read permissions for.

GetFTP

Fetches files from an FTP Server and creates FlowFiles from them

GetHBase

This Processor polls HBase for any records in the specified table. The processor keeps track of the timestamp of the cells that it receives, so that as new records are pushed to HBase, they will automatically be pulled. Each record is output in JSON format, as {“row”: “", "cells": { "<column 1 family>:<column 1 qualifier>": "<cell 1 value>", "<column 2 family>:<column 2 qualifier>": "<cell 2 value>", ... }}. For each record received, a Provenance RECEIVE event is emitted with the format hbase://<table name>/, where is the UTF-8 encoded value of the row's key.

GetHDFS

Fetch files from Hadoop Distributed File System (HDFS) into FlowFiles. This Processor will delete the file from HDFS after fetching it.

GetHDFSEvents

This processor polls the notification events provided by the HdfsAdmin API. Since this uses the HdfsAdmin APIs it is required to run as an HDFS super user. Currently there are six types of events (append, close, create, metadata, rename, and unlink). Please see org.apache.hadoop.hdfs.inotify.Event documentation for full explanations of each event. This processor will poll for new events based on a defined duration. For each event received a new flow file will be created with the expected attributes and the event itself serialized to JSON and written to the flow file’s content. For example, if event.type is APPEND then the content of the flow file will contain a JSON file containing the information about the append event. If successful the flow files are sent to the ‘success’ relationship. Be careful of where the generated flow files are stored. If the flow files are stored in one of processor’s watch directories there will be a never ending flow of events. It is also important to be aware that this processor must consume all events. The filtering must happen within the processor. This is because the HDFS admin’s event notifications API does not have filtering.

GetHDFSSequenceFile

Fetch sequence files from Hadoop Distributed File System (HDFS) into FlowFiles

GetHTMLElement

Extracts HTML element values from the incoming flowfile’s content using a CSS selector. The incoming HTML is first converted into a HTML Document Object Model so that HTML elements may be selected in the similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then “queried” using the user defined CSS selector string. The result of “querying” the HTML DOM may produce 0-N results. If no results are found the flowfile will be transferred to the “element not found” relationship to indicate so to the end user. If N results are found a new flowfile will be created and emitted for each result. The query result will either be placed in the content of the new flowfile or as an attribute of the new flowfile. By default the result is written to an attribute. This can be controlled by the “Destination” property. Resulting query values may also have data prepended or appended to them by setting the value of property “Prepend Element Value” or “Append Element Value”. Prepended and appended values are treated as string values and concatenated to the result retrieved from the HTML DOM query operation. A more thorough reference for the CSS selector syntax can be found at “http://jsoup.org/apidocs/org/jsoup/select/Selector.html”

GetHTTP

Fetches data from an HTTP or HTTPS URL and writes the data to the content of a FlowFile. Once the content has been fetched, the ETag and Last Modified dates are remembered (if the web server supports these concepts). This allows the Processor to fetch new data only if the remote data has changed or until the state is cleared. That is, once the content has been fetched from the given URL, it will not be fetched again until the content on the remote server changes. Note that due to limitations on state management, stored “last modified” and etag fields never expire. If the URL in GetHttp uses Expression Language that is unbounded, there is the potential for Out of Memory Errors to occur.

GetJMSQueue

Pulls messages from a JMS Queue, creating a FlowFile for each JMS Message or bundle of messages, as configured

GetJMSTopic

Pulls messages from a JMS Topic, creating a FlowFile for each JMS Message or bundle of messages, as configured

GetKafka

Fetches messages from Apache Kafka, specifically for 0.8.x versions. The complementary NiFi processor for sending messages is PutKafka.

GetMongo

Creates FlowFiles from documents in MongoDB

GetSFTP

Fetches files from an SFTP Server and creates FlowFiles from them

GetSNMP

Retrieves information from SNMP Agent and outputs a FlowFile with information in attributes and without any content

GetSolr

Queries Solr and outputs the results as a FlowFile

GetSplunk

Retrieves data from Splunk Enterprise.

GetSQS

Fetches messages from an Amazon Simple Queuing Service Queue

GetTwitter

Pulls status changes from Twitter’s streaming API

HandleHttpRequest

Starts an HTTP Server and listens for HTTP Requests. For each request, creates a FlowFile and transfers to ‘success’. This Processor is designed to be used in conjunction with the HandleHttpResponse Processor in order to create a Web Service

HandleHttpResponse

Sends an HTTP Response to the Requestor that generated a FlowFile. This Processor is designed to be used in conjunction with the HandleHttpRequest in order to create a web service.

HashAttribute

Hashes together the key/value pairs of several FlowFile Attributes and adds the hash as a new attribute. Optional properties are to be added such that the name of the property is the name of a FlowFile Attribute to consider and the value of the property is a regular expression that, if matched by the attribute value, will cause that attribute to be used as part of the hash. If the regular expression contains a capturing group, only the value of the capturing group will be used.

HashContent

Calculates a hash value for the Content of a FlowFile and puts that hash value on the FlowFile as an attribute whose name is determined by the property

IdentifyMimeType

Attempts to identify the MIME Type used for a FlowFile. If the MIME Type can be identified, an attribute with the name ‘mime.type’ is added with the value being the MIME Type. If the MIME Type cannot be determined, the value will be set to ‘application/octet-stream’. In addition, the attribute mime.extension will be set if a common file extension for the MIME Type is known.

InferAvroSchema

Examines the contents of the incoming FlowFile to infer an Avro schema. The processor will use the Kite SDK to make an attempt to automatically generate an Avro schema from the incoming content. When inferring the schema from JSON data the key names will be used in the resulting Avro schema definition. When inferring from CSV data a “header definition” must be present either as the first line of the incoming data or the “header definition” must be explicitly set in the property “CSV Header Definition”. A “header definition” is simply a single comma separated line defining the names of each column. The “header definition” is required in order to determine the names that should be given to each field in the resulting Avro definition. When inferring data types the higher order data type is always used if there is ambiguity. For example when examining numerical values the type may be set to “long” instead of “integer” since a long can safely hold the value of any “integer”. Only CSV and JSON content is currently supported for automatically inferring an Avro schema. The type of content present in the incoming FlowFile is set by using the property “Input Content Type”. The property can either be explicitly set to CSV, JSON, or “use mime.type value” which will examine the value of the mime.type attribute on the incoming FlowFile to determine the type of content present.

InvokeHTTP

An HTTP client processor which can interact with a configurable HTTP Endpoint. The destination URL and HTTP Method are configurable. FlowFile attributes are converted to HTTP headers and the FlowFile contents are included as the body of the request (if the HTTP Method is PUT or POST).

InvokeScriptedProcessor

Experimental - Invokes a script engine for a Processor defined in the given script. The script must define a valid class that implements the Processor interface, and it must set a variable ‘processor’ to an instance of the class. Processor methods such as onTrigger() will be delegated to the scripted Processor instance. Also any Relationships or PropertyDescriptors defined by the scripted processor will be added to the configuration dialog. Experimental: Impact of sustained usage not yet verified.

JoltTransformJSON

Applies a list of Jolt specifications to the flowfile JSON payload. A new FlowFile is created with transformed content and is routed to the ‘success’ relationship. If the JSON transform fails, the original FlowFile is routed to the ‘failure’ relationship.

ListenHTTP

Starts an HTTP Server that is used to receive FlowFiles from remote sources. The default URI of the Service will be http://{hostname}:{port}/contentListener

ListenLumberjack

Listens for Lumberjack messages being sent to a given port over TCP. Each message will be acknowledged after successfully writing the message to a FlowFile. Each FlowFile will contain data portion of one or more Lumberjack frames. In the case where the Lumberjack frames contain syslog messages, the output of this processor can be sent to a ParseSyslog processor for further processing.

ListenRELP

Listens for RELP messages being sent to a given port over TCP. Each message will be acknowledged after successfully writing the message to a FlowFile. Each FlowFile will contain data portion of one or more RELP frames. In the case where the RELP frames contain syslog messages, the output of this processor can be sent to a ParseSyslog processor for further processing.

ListenSyslog

Listens for Syslog messages being sent to a given port over TCP or UDP. Incoming messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The format of each message is: ()(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The timestamp can be an RFC5424 timestamp with a format of "yyyy-MM-dd'T'HH:mm:ss.SZ" or "yyyy-MM-dd'T'HH:mm:ss.S+hh:mm", or it can be an RFC3164 timestamp with a format of "MMM d HH:mm:ss". If an incoming messages matches one of these patterns, the message will be parsed and the individual pieces will be placed in FlowFile attributes, with the original message in the content of the FlowFile. If an incoming message does not match one of these patterns it will not be parsed and the syslog.valid attribute will be set to false with the original message in the content of the FlowFile. Valid messages will be transferred on the success relationship, and invalid messages will be transferred on the invalid relationship.

ListenTCP

Listens for incoming TCP connections and reads data from each connection using a line separator as the message demarcator. The default behavior is for each message to produce a single FlowFile, however this can be controlled by increasing the Batch Size to a larger value for higher throughput. The Receive Buffer Size must be set as large as the largest messages expected to be received, meaning if every 100kb there is a line separator, then the Receive Buffer Size must be greater than 100kb.

ListenUDP

Listens for Datagram Packets on a given port. The default behavior produces a FlowFile per datagram, however for higher throughput the Max Batch Size property may be increased to specify the number of datagrams to batch together in a single FlowFile. This processor can be restricted to listening for datagrams from a specific remote host and port by specifying the Sending Host and Sending Host Port properties, otherwise it will listen for datagrams from all hosts and ports.

ListFile

Retrieves a listing of files from the local filesystem. For each file that is listed, creates a FlowFile that represents the file so that it can be fetched in conjunction with FetchFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Unlike GetFile, this Processor does not delete any data from the local filesystem.

ListHDFS

Retrieves a listing of files from HDFS. For each file that is listed in HDFS, creates a FlowFile that represents the HDFS file so that it can be fetched in conjunction with ListHDFS. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Unlike GetHDFS, this Processor does not delete any data from HDFS.

ListS3

Retrieves a listing of objects from an S3 bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchS3Object. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.

ListSFTP

Performs a listing of the files residing on an SFTP server. For each file that is found on the remote server, a new FlowFile will be created with the filename attribute set to the name of the file on the remote server. This can then be used in conjunction with FetchSFTP in order to fetch those files.

LogAttribute

No description provided.

MergeContent

Merges a Group of FlowFiles together based on a user-defined strategy and packages them into a single FlowFile. It is recommended that the Processor be configured with only a single incoming connection, as Group of FlowFiles will not be created from FlowFiles in different connections. This processor updates the mime.type attribute as appropriate.

ModifyBytes

Discard byte range at the start and end or all content of a binary file.

ModifyHTMLElement

Modifies the value of an existing HTML element. The desired element to be modified is located by using CSS selector syntax. The incoming HTML is first converted into a HTML Document Object Model so that HTML elements may be selected in the similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then “queried” using the user defined CSS selector string to find the element the user desires to modify. If the HTML element is found the element’s value is updated in the DOM using the value specified “Modified Value” property. All DOM elements that match the CSS selector will be updated. Once all of the DOM elements have been updated the DOM is rendered to HTML and the result replaces the flowfile content with the updated HTML. A more thorough reference for the CSS selector syntax can be found at “http://jsoup.org/apidocs/org/jsoup/select/Selector.html”

MonitorActivity

Monitors the flow for activity and sends out an indicator when the flow has not had any data for some specified amount of time and again when the flow’s activity is restored

ParseSyslog

Parses the contents of a Syslog message and adds attributes to the FlowFile for each of the parts of the Syslog message

PostHTTP

Performs an HTTP Post with the content of the FlowFile

PublishAMQP

Creates a AMQP Message from the contents of a FlowFile and sends the message to an AMQP Exchange.In a typical AMQP exchange model, the message that is sent to the AMQP Exchange will be routed based on the ‘Routing Key’ to its final destination in the queue (the binding). If due to some misconfiguration the binding between the Exchange, Routing Key and Queue is not set up, the message will have no final destination and will return (i.e., the data will not make it to the queue). If that happens you will see a log in both app-log and bulletin stating to that effect. Fixing the binding (normally done by AMQP administrator) will resolve the issue.

PublishJMS

Creates a JMS Message from the contents of a FlowFile and sends it to a JMS Destination (queue or topic) as JMS BytesMessage. FlowFile attributes will be added as JMS headers and/or properties to the outgoing JMS message.

PublishKafka

Sends the contents of a FlowFile as a message to Apache Kafka, using the Kafka 0.9.x Producer. The messages to send may be individual FlowFiles or may be delimited, using a user-specified delimiter, such as a new-line. The complementary NiFi processor for fetching messages is ConsumeKafka.

PublishMQTT

Publishes a message to an MQTT topic

PutAzureEventHub

Sends the contents of a FlowFile to a Windows Azure Event Hub. Note: the content of the FlowFile will be buffered into memory before being sent, so care should be taken to avoid sending FlowFiles to this Processor that exceed the amount of Java Heap Space available.

PutCassandraQL

Execute provided Cassandra Query Language (CQL) statement on a Cassandra 1.x, 2.x, or 3.0.x cluster. The content of an incoming FlowFile is expected to be the CQL command to execute. The CQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention cql.args.N.type and cql.args.N.value, where N is a positive integer. The cql.args.N.type is expected to be a lowercase string indicating the Cassandra type.

PutCouchbaseKey

Put a document to Couchbase Server via Key/Value access.

PutDistributedMapCache

Gets the content of a FlowFile and puts it to a distributed map cache, using a cache key computed from FlowFile attributes. If the cache already contains the entry and the cache update strategy is ‘keep original’ the entry is not replaced.’

PutDynamoDB

Puts a document from DynamoDB based on hash and range key. The table can have either hash and range or hash key alone. Currently the keys supported are string and number and value can be json document. In case of hash and range keys both key are required for the operation. The FlowFile content must be JSON. FlowFile content is mapped to the specified Json Document attribute in the DynamoDB item.

PutElasticsearch

Writes the contents of a FlowFile to Elasticsearch, using the specified parameters such as the index to insert into and the type of the document. If the cluster has been configured for authorization and/or secure transport (SSL/TLS) and the Shield plugin is available, secure connections can be made. This processor supports Elasticsearch 2.x clusters.

PutEmail

Sends an e-mail to configured recipients for each incoming FlowFile

PutFile

Writes the contents of a FlowFile to the local file system

PutFTP

Sends FlowFiles to an FTP Server

PutHBaseCell

Adds the Contents of a FlowFile to HBase as the value of a single cell

PutHBaseJSON

Adds rows to HBase based on the contents of incoming JSON documents. Each FlowFile must contain a single UTF-8 encoded JSON document, and any FlowFiles where the root element is not a single document will be routed to failure. Each JSON field name and value will become a column qualifier and value of the HBase row. Any fields with a null value will be skipped, and fields with a complex value will be handled according to the Complex Field Strategy. The row id can be specified either directly on the processor through the Row Identifier property, or can be extracted from the JSON document by specifying the Row Identifier Field Name property. This processor will hold the contents of all FlowFiles for the given batch in memory at one time.

PutHDFS

Write FlowFile data to Hadoop Distributed File System (HDFS)

PutHiveQL

Executes a HiveQL DDL/DML command (UPDATE, INSERT, e.g.). The content of an incoming FlowFile is expected to be the HiveQL command to execute. The HiveQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention hiveql.args.N.type and hiveql.args.N.value, where N is a positive integer. The hiveql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.

PutHTMLElement

Places a new HTML element in the existing HTML DOM. The desired position for the new HTML element is specified by using CSS selector syntax. The incoming HTML is first converted into a HTML Document Object Model so that HTML DOM location may be located in a similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then “queried” using the user defined CSS selector string to find the position where the user desires to add the new HTML element. Once the new HTML element is added to the DOM it is rendered to HTML and the result replaces the flowfile content with the updated HTML. A more thorough reference for the CSS selector syntax can be found at “http://jsoup.org/apidocs/org/jsoup/select/Selector.html”

PutJMS

Creates a JMS Message from the contents of a FlowFile and sends the message to a JMS Server

PutKafka

Sends the contents of a FlowFile as a message to Apache Kafka, specifically for 0.8.x versions. The messages to send may be individual FlowFiles or may be delimited, using a user-specified delimiter, such as a new-line. The complementary NiFi processor for fetching messages is GetKafka.

PutKinesisFirehose

Sends the contents to a specified Amazon Kinesis Firehose. In order to send data to firehose, the firehose delivery stream name has to be specified.

PutLambda

Sends the contents to a specified Amazon Lamba Function. The AWS credentials used for authentication must have permissions execute the Lambda function (lambda:InvokeFunction).The FlowFile content must be JSON.

PutMongo

Writes the contents of a FlowFile to MongoDB

PutRiemann

Send events to Riemann (http://riemann.io) when FlowFiles pass through this processor. You can use events to notify Riemann that a FlowFile passed through, or you can attach a more meaningful metric, such as, the time a FlowFile took to get to this processor. All attributes attached to events support the NiFi Expression Language.

PutS3Object

Puts FlowFiles to an Amazon S3 Bucket The upload uses either the PutS3Object method or PutS3MultipartUpload methods. The PutS3Object method send the file in a single synchronous call, but it has a 5GB size limit. Larger files are sent using the multipart upload methods that initiate, transfer the parts, and complete an upload. This multipart process saves state after each step so that a large upload can be resumed with minimal loss if the processor or cluster is stopped and restarted. A multipart upload consists of three steps 1) initiate upload, 2) upload the parts, and 3) complete the upload. For multipart uploads, the processor saves state locally tracking the upload ID and parts uploaded, which must both be provided to complete the upload. The AWS libraries select an endpoint URL based on the AWS region, but this can be overridden with the ‘Endpoint Override URL’ property for use with other S3-compatible endpoints. The S3 API specifies that the maximum file size for a PutS3Object upload is 5GB. It also requires that parts in a multipart upload must be at least 5MB in size, except for the last part. These limits are establish the bounds for the Multipart Upload Threshold and Part Size properties.

PutSFTP

Sends FlowFiles to an SFTP Server

PutSlack

Sends a message to your team on slack.com

PutSNS

Sends the content of a FlowFile as a notification to the Amazon Simple Notification Service

PutSolrContentStream

Sends the contents of a FlowFile as a ContentStream to Solr

PutSplunk

Sends logs to Splunk Enterprise over TCP, TCP + TLS/SSL, or UDP. If a Message Delimiter is provided, then this processor will read messages from the incoming FlowFile based on the delimiter, and send each message to Splunk. If a Message Delimiter is not provided then the content of the FlowFile will be sent directly to Splunk as if it were a single message.

PutSQL

Executes a SQL UPDATE or INSERT command. The content of an incoming FlowFile is expected to be the SQL command to execute. The SQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.

PutSQS

Publishes a message to an Amazon Simple Queuing Service Queue

PutSyslog

Sends Syslog messages to a given host and port over TCP or UDP. Messages are constructed from the “Message ___” properties of the processor which can use expression language to generate messages from incoming FlowFiles. The properties are used to construct messages of the form: ()(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The constructed messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The timestamp can be an RFC5424 timestamp with a format of "yyyy-MM-dd'T'HH:mm:ss.SZ" or "yyyy-MM-dd'T'HH:mm:ss.S+hh:mm", or it can be an RFC3164 timestamp with a format of "MMM d HH:mm:ss". If a message is constructed that does not form a valid Syslog message according to the above description, then it is routed to the invalid relationship. Valid messages are sent to the Syslog server and successes are routed to the success relationship, failures routed to the failure relationship.

PutTCP

The PutTCP processor receives a FlowFile and transmits the FlowFile content over a TCP connection to the configured TCP server. By default, the FlowFiles are transmitted over the same TCP connection (or pool of TCP connections if multiple input threads are configured). To assist the TCP server with determining message boundaries, an optional “Outgoing Message Delimiter” string can be configured which is appended to the end of each FlowFiles content when it is transmitted over the TCP connection. An optional “Connection Per FlowFile” parameter can be specified to change the behaviour so that each FlowFiles content is transmitted over a single TCP connection which is opened when the FlowFile is received and closed after the FlowFile has been sent. This option should only be used for low message volume scenarios, otherwise the platform may run out of TCP sockets.

PutUDP

The PutUDP processor receives a FlowFile and packages the FlowFile content into a single UDP datagram packet which is then transmitted to the configured UDP server. The user must ensure that the FlowFile content being fed to this processor is not larger than the maximum size for the underlying UDP transport. The maximum transport size will vary based on the platform setup but is generally just under 64KB. FlowFiles will be marked as failed if their content is larger than the maximum transport size.

QueryCassandra

Execute provided Cassandra Query Language (CQL) select query on a Cassandra 1.x, 2.x, or 3.0.x cluster. Query result may be converted to Avro or JSON format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘executecql.row.count’ indicates how many rows were selected.

QueryDatabaseTable

Execute provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘querydbtable.row.count’ indicates how many rows were selected.

ReplaceText

Updates the content of a FlowFile by evaluating a Regular Expression (regex) against it and replacing the section of the content that matches the Regular Expression with some alternate value.

ReplaceTextWithMapping

Updates the content of a FlowFile by evaluating a Regular Expression against it and replacing the section of the content that matches the Regular Expression with some alternate value provided in a mapping file.

ResizeImage

Resizes an image to user-specified dimensions. This Processor uses the image codecs registered with the environment that NiFi is running in. By default, this includes JPEG, PNG, BMP, WBMP, and GIF images.

RouteHL7

Routes incoming HL7 data according to user-defined queries. To add a query, add a new property to the processor. The name of the property will become a new relationship for the processor, and the value is an HL7 Query Language query. If a FlowFile matches the query, a copy of the FlowFile will be routed to the associated relationship.

RouteOnAttribute

Routes FlowFiles based on their Attributes using the Attribute Expression Language

RouteOnContent

Applies Regular Expressions to the content of a FlowFile and routes a copy of the FlowFile to each destination whose Regular Expression matches. Regular Expressions are added as User-Defined Properties where the name of the property is the name of the relationship and the value is a Regular Expression to match against the FlowFile content. User-Defined properties do support the Attribute Expression Language, but the results are interpreted as literal values, not Regular Expressions

RouteText

Routes textual data based on a set of user-defined rules. Each line in an incoming FlowFile is compared against the values specified by user-defined Properties. The mechanism by which the text is compared to these user-defined properties is defined by the ‘Matching Strategy’. The data is then routed according to these rules, routing each line of the text individually.

ScanAttribute

Scans the specified attributes of FlowFiles, checking to see if any of their values are present within the specified dictionary of terms

ScanContent

Scans the content of FlowFiles for terms that are found in a user-supplied dictionary. If a term is matched, the UTF-8 encoded version of the term will be added to the FlowFile using the ‘matching.term’ attribute

SegmentContent

Segments a FlowFile into multiple smaller segments on byte boundaries. Each segment is given the following attributes: fragment.identifier, fragment.index, fragment.count, segment.original.filename; these attributes can then be used by the MergeContent processor in order to reconstitute the original FlowFile

SelectHiveQL

Execute provided HiveQL SELECT query against a Hive database connection. Query result will be converted to Avro or CSV format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘selecthiveql.row.count’ indicates how many rows were selected.

SetSNMP

Based on incoming FlowFile attributes, the processor will execute SNMP Set requests. When founding attributes with name like snmp$, the processor will atempt to set the value of attribute to the corresponding OID given in the attribute name

SplitAvro

Splits a binary encoded Avro datafile into smaller files based on the configured Output Size. The Output Strategy determines if the smaller files will be Avro datafiles, or bare Avro records with metadata in the FlowFile attributes. The output will always be binary encoded.

SplitContent

Splits incoming FlowFiles by a specified byte sequence

SplitJson

Splits a JSON File into multiple, separate FlowFiles for an array element specified by a JsonPath expression. Each generated FlowFile is comprised of an element of the specified array and transferred to relationship ‘split,’ with the original file transferred to the ‘original’ relationship. If the specified JsonPath is not found or does not evaluate to an array element, the original file is routed to ‘failure’ and no files are generated.

SplitText

Splits a text file into multiple smaller text files on line boundaries limited by maximum number of lines or total size of fragment. Each output split file will contain no more than the configured number of lines or bytes. If both Line Split Count and Maximum Fragment Size are specified, the split occurs at whichever limit is reached first. If the first line of a fragment exceeds the Maximum Fragment Size, that line will be output in a single split file which exceeds the configured maximum size limit.

SplitXml

Splits an XML File into multiple separate FlowFiles, each comprising a child or descendant of the original root element

SpringContextProcessor

A Processor that supports sending and receiving data from application defined in Spring Application Context via predefined in/out MessageChannels.

StoreInKiteDataset

Stores Avro records in a Kite dataset

TailFile

“Tails” a file, ingesting data from the file as it is written to the file. The file is expected to be textual. Data is ingested only when a new line is encountered (carriage return or new-line character or combination). If the file to tail is periodically “rolled over”, as is generally the case with log files, an optional Rolling Filename Pattern can be used to retrieve data from files that have rolled over, even if the rollover occurred while NiFi was not running (provided that the data still exists upon restart of NiFi). It is generally advisable to set the Run Schedule to a few seconds, rather than running with the default value of 0 secs, as this Processor will consume a lot of resources if scheduled very aggressively. At this time, this Processor does not support ingesting files that have been compressed when ‘rolled over’.

TransformXml

Applies the provided XSLT file to the flowfile XML payload. A new FlowFile is created with transformed content and is routed to the ‘success’ relationship. If the XSL transform fails, the original FlowFile is routed to the ‘failure’ relationship

UnpackContent

Unpacks the content of FlowFiles that have been packaged with one of several different Packaging Formats, emitting one to many FlowFiles for each input FlowFile

UpdateAttribute

Updates the Attributes for a FlowFile by using the Attribute Expression Language and/or deletes the attributes based on a regular expression

ValidateXml

Validates the contents of FlowFiles against a user-specified XML Schema file

YandexTranslate

Translates content and attributes from one language to another