druiddb-ruby
This documentation is intended to be a quick-start guide, not a comprehensive
list of all available methods and configuration options. Please look through
the source for more information; a great place to get started is DruidDB::Client
and the DruidDB::Query
modules as they expose most of the methods on the client.
This guide assumes a significant knowledge of Druid, for more info: http://druid.io/docs/latest/design/index.html
Install
$ gem install druiddb
Usage
Creating a Client
client = DruidDB::Client.new()
Note: There are many configuration options, please take a look at
DruidDB::Configuration
for more details.
Writing Data
Kafka Indexing service
This gem leverages the Kafka Indexing Service for ingesting data. The gem pushes datapoints onto Kafka topics (typically named after the datasource). You can also use the gem to upload an ingestion spec, which is needed for Druid to consume the Kafka topic.
This repo contains a docker-compose.yml
build that may help bootstrap development with Druid and the Kafka Indexing Service. It's what we use for integration testing.
Submitting an Ingestion Spec
path = 'path/to/spec.json'
client.submit_supervisor_spec(path)
Writing Datapoints
topic_name = 'foo'
datapoint = {
timestamp: Time.now.utc.iso8601,
foo: 'bar',
units: 1
}
client.write_point(topic_name, datapoint)
Reading Data
Querying
client.query(
queryType: 'timeseries',
dataSource: 'foo',
granularity: 'day',
intervals: Time.now.utc.advance(days: -30) + '/' + Time.now.utc.iso8601,
aggregations: [{ type: 'longSum', name: 'baz', fieldName: 'baz' }]
)
The query
method POSTs the query to Druid; for information on
querying Druid: http://druid.io/docs/latest/querying/querying.html. This is
intentionally simple to allow all current features and hopefully all future
features of the Druid query language without updating the gem.
Fill Empty Intervals
Currently, Druid will not fill empty intervals for which there are no points. To
accommodate this need until it is handled more efficiently in Druid, use the
experimental fill_value
feature in your query. This ensure you get a result
for every interval in intervals.
This has only been tested with 'timeseries' and single-dimension 'groupBy' queries with simple granularities.
client.query(
queryType: 'timeseries',
dataSource: 'foo',
granularity: 'day',
intervals: Time.now.utc.advance(days: -30) + '/' + Time.now.utc.iso8601,
aggregations: [{ type: 'longSum', name: 'baz', fieldName: 'baz' }],
fill_value: 0
)
Management
List datasources.
client.list_datasources
List supervisor tasks.
client.supervisor_tasks
Development
Docker Compose
This project uses docker-compose to provide a development environment.
- git clone the project
- cd into project
docker-compose up
- this will download necessary images and run all dependencies in the foreground.
Then you can use docker build -t some_tag .
to build the Docker image for this project after making changes and docker run -it --network=druiddbruby_druiddb some_tag some_command
to interact with it.
Metabase
Viewing data in the database can be a bit annoying, use a tool like Metabase makes this much easier and is what I personally do when developing.
Testing
Testing is run utilizing the docker-compose environment.
docker-compose up
docker run -it --network=druiddbruby_druiddb druiddb-ruby bin/run_tests.sh
License
The gem is available as open source under the terms of the MIT License.