External reviews
External reviews are not included in the AWS star rating for the product.
Staging data for insights
What do you like best about the product?
It makes the power of Spark accessible and innovative solutions like Delta Lake.
What do you dislike about the product?
Fewer solutions that aren't wholly or partially on the cloud.
What problems is the product solving and how is that benefiting you?
We are staging large datasets for reporting and multiple BI solutions.
- Leave a Comment |
- Mark review as helpful
Best tool for big data
What do you like best about the product?
Easy to use multiple languages based command in same notebook. Direct connection to Redshift.
What do you dislike about the product?
Sometime it takes lot of time to load data. Should show better suggestions.
What problems is the product solving and how is that benefiting you?
We are using databricks to analyse big data and get business insights.
One stop shop for all your data problems
What do you like best about the product?
It has got everything in it. IDE, Version Control, Scheduling whatnot.
What do you dislike about the product?
I didn't find something that discomforts me yet.
What problems is the product solving and how is that benefiting you?
Currently, I'm using it as an ETL tool. It's easy to use and connects with any data source—excellent documentation and help from the community.
Recommendations to others considering the product:
Just go for it. You can do many things you want to do with your data.
Very powerful yet easy to use distributed computing and data warehousing platform
What do you like best about the product?
Databricks had very powerful distributed computing built in with easy to deploy optimized clusters for spark computations. The notebooks with MLFlow integration makes it easy to use for Analytics and Data Science team yet the underlying APIs and CICD integrations make it very customizable for the Data Engineers to create complex automated data pipelines. Ability to store and query and manipulate massive Spark SQL tables with ACID in Delta Lake makes big data easily accessible to all in the organization.
What do you dislike about the product?
It lacks built in data backup features and ability to restrict data access to specific users. So if anyone accidentally deletes data from Delta Table or DBFS, the lost data cannot be retrieved unless we setup our own customized backup solution.
What problems is the product solving and how is that benefiting you?
I have worked with big data with hundreds of millions of rows using databricks. We do most of the ELT, data cleaning and prepping works on databricks. The ease and speed of querying bid data using databricks SparkSQL is very useful. It is also very easy to create prototype codes utilizing real sized data using the available Python and R notebooks.
Reduced database network redistributions & run-time of key models by 99+%!
What do you like best about the product?
Incidentally, the thing I like most about Databricks isn't a product feature at all; I love Databricks's proactive and customer-centric service, always willing to make an exception or create a unique feature, all the while minimizing costs for the customer - as @Heather Akuiyibo & Shelby Ferson et al. have done for me and my former teams!
What do you dislike about the product?
Broadening programming logic and syntax.
What problems is the product solving and how is that benefiting you?
To name seven (7):
(1) User segmentation using a proprietary variation of a hierarchical DBSCAN clustering algorithm of high-dimensional data with novel distance [quasi] metric, based on hubness analysis;
(2) Leveraging the above in email targeting and invoking multi-armed bandit testing methodologies for email timing, frequency, and content, using decreasing-epsilon strategy;
(3) Modeling predicted underwriting criteria with a binary approval odds classification algorithm;
(4) Using a dynamic panel data, fixed effects model to predict the effect of changes in credit reports on user credit score;
(5) Employing an Autoregressive Integrated Moving Average (ARIMA) with optimized Akaike Information Criterion exploits to predict future revenue and growth (lagged results led to average error bounds of only 5 percent; cross-validation results were even stronger, though I was conservative in guaranteeing 7 percent error, on average);
(6) Refining a multiverse (context-aware) recommendation engine as an n-dimensional tensor (rather than the typical two-dimensional user-item matrix) for partner product recommendations, using High-Order Singular Value Decomposition to solve;
(7) Invoking a Convolutional Neural Network framework with a novel architecture and results of a Fourier Transform as input to classify dental x-rays and highlight to the dentist which teeth require fillings (after approximately two months, the model reached ~95 percent accuracy - in terms of actual agreement by dentists using the app - with F1 score in cross-validation performing on par).
(1) User segmentation using a proprietary variation of a hierarchical DBSCAN clustering algorithm of high-dimensional data with novel distance [quasi] metric, based on hubness analysis;
(2) Leveraging the above in email targeting and invoking multi-armed bandit testing methodologies for email timing, frequency, and content, using decreasing-epsilon strategy;
(3) Modeling predicted underwriting criteria with a binary approval odds classification algorithm;
(4) Using a dynamic panel data, fixed effects model to predict the effect of changes in credit reports on user credit score;
(5) Employing an Autoregressive Integrated Moving Average (ARIMA) with optimized Akaike Information Criterion exploits to predict future revenue and growth (lagged results led to average error bounds of only 5 percent; cross-validation results were even stronger, though I was conservative in guaranteeing 7 percent error, on average);
(6) Refining a multiverse (context-aware) recommendation engine as an n-dimensional tensor (rather than the typical two-dimensional user-item matrix) for partner product recommendations, using High-Order Singular Value Decomposition to solve;
(7) Invoking a Convolutional Neural Network framework with a novel architecture and results of a Fourier Transform as input to classify dental x-rays and highlight to the dentist which teeth require fillings (after approximately two months, the model reached ~95 percent accuracy - in terms of actual agreement by dentists using the app - with F1 score in cross-validation performing on par).
Recommendations to others considering the product:
Be open to the pitch. You may think things are "going fine" or proffer the idea of "if it ain't broke, don't fix it," but these represent short-term thinking traps such that scaling becomes inherently and implicitly constrained and limited. Databricks amounts to the forward-thinking businessperson.
How I experienced databricks
What do you like best about the product?
It is great when you have large amount of data, excellent for collaboration, perfect for using with visualisation tools and functions with many programming languages.
What do you dislike about the product?
Difficult to get a grasp on how many applications and funcrions it has.
What problems is the product solving and how is that benefiting you?
It s great for ELT of date to use with power BI
Recommendations to others considering the product:
Use it it s the best available and it s great!
Excellent infrastructure, can scale clusters in no time
What do you like best about the product?
Interactive clusters, user friendly, excellent cluster management
What do you dislike about the product?
Cluster takes some time to heat up on start, should support upsert without delta as business need pure upserts too
What problems is the product solving and how is that benefiting you?
Can seemlessly use pyspark, Python to build a robust pipeline
Recommendations to others considering the product:
It's the best infrastructure to build pipelines if you are planning to use spark in production
Databricks- Big Data processing tool
What do you like best about the product?
Very easy to use. No need to install and setup spark manually.
provides a notebook environment to write code.
support various languages like Python, Spark-SQL, R, Scala, etc.
easy to set up and use.
you can choose the cluster according to your need.
Support Machine Learning flows and Streaming Data.
Automatic suspend cluster if inactive for more than a given time( Cost-cutting)
Auto scalable Cluster.
Optimize uses of clusters (resources)
provides a notebook environment to write code.
support various languages like Python, Spark-SQL, R, Scala, etc.
easy to set up and use.
you can choose the cluster according to your need.
Support Machine Learning flows and Streaming Data.
Automatic suspend cluster if inactive for more than a given time( Cost-cutting)
Auto scalable Cluster.
Optimize uses of clusters (resources)
What do you dislike about the product?
No CI/ CD features given by default.
Costly for small level Enterprise.
Certification cost is high.
Costly for small level Enterprise.
Certification cost is high.
What problems is the product solving and how is that benefiting you?
We have to develop pipelines. We are getting data from different sources like AWS S3, redshift and we had to process that large amount of data on Databricks and put it back to our Dataware house.
Recommendations to others considering the product:
Splunk is a best tool when it comes to Big data processing. it is easy to use and setup
MLFlow: One stop solution for data science model tracking, versioning and deployemet
What do you like best about the product?
1) A single format to support all measure ML libraries such as Sklearn, Tensorflow, MXnet, Spark MLlib, Pyspark etc.
2) Capabilities to deploy on Amazon Sagemaker with just one API call
3) Flexibility to log all model params such as Accuracy, Recall, etc. along with Hyperparameter tuning support.
4) A good GUI to compare and select the best models.
5) Model registry to track Staging, Production, and Archived models.
6) Python best API
7) REST APIs supported.
8) Available out of the box in Microsoft Azure.
2) Capabilities to deploy on Amazon Sagemaker with just one API call
3) Flexibility to log all model params such as Accuracy, Recall, etc. along with Hyperparameter tuning support.
4) A good GUI to compare and select the best models.
5) Model registry to track Staging, Production, and Archived models.
6) Python best API
7) REST APIs supported.
8) Available out of the box in Microsoft Azure.
What do you dislike about the product?
1) CI/CD pipeline is not supported in the open-source version
2) Recent framework so not a very large community
3) Dependent on many python libraries. It can be a problem while resolving dependencies in your existing setup.
2) Recent framework so not a very large community
3) Dependent on many python libraries. It can be a problem while resolving dependencies in your existing setup.
What problems is the product solving and how is that benefiting you?
I have used it for managing the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
The same thing can be done in Amazon sagemaker, GCP AI Platform, Microsoft Azure etc. but it would require monthly expenses. It can be good for initial startup data science team.
The same thing can be done in Amazon sagemaker, GCP AI Platform, Microsoft Azure etc. but it would require monthly expenses. It can be good for initial startup data science team.
Recommendations to others considering the product:
It cant be a complete solution for the data science/ML engineering flow. But is essential in the pipeline. It may be used with Apache Airflow to have an end to end ML ops solution. Also, it works best with Amazon sagemaker and Microsoft Azure. However, GCP AI platform support is still in the development phase.
You would also need to take care of CI/CD pipeline for ML models on your own.
You would also need to take care of CI/CD pipeline for ML models on your own.
Lightening Speed Analytics
What do you like best about the product?
DataBricks is a great analytics tool which provides lightening speed analytics and has given new abilities to Data Scientists. Additionally, our advanced analytics at scale has gone up 100 times.
What do you dislike about the product?
The learning curve is steep and people would need coding knowledge to work with Databricks. It can also be costly at times.
What problems is the product solving and how is that benefiting you?
Problems - Analytics problems
Benefits - Scale and Speed
Benefits - Scale and Speed
showing 331 - 340