A Brief Exploration of Exploratory Data Analysis (EDA)

A Brief Exploration of Exploratory Data Analysis (EDA)

You probably wouldn’t buy a new car before checking out some online reviews, reading up on its specs, and taking it for a test run. In a similar manner, it would be unwise to make critical business decisions on the basis of information and assumptions that haven’t been screened or tested in some way. That’s what exploratory data analysis is all about.

What Is Exploratory Data Analysis?

In essence, exploratory data analysis or EDA is a way of getting an overview of the quality and nature of the information available before you begin studying it in more detail. In the context of business intelligence (BI), EDA involves conducting initial investigations on data to discover existing patterns, spot anomalies, test out theories or hypotheses concerning the information and check out the validity of any assumptions made about the data prior to analysis.

The concept isn’t really a new one. Cautious investigators have been making use of the principal for decades, if not centuries. But in 1977, John W. Tukey coined the phrase in his book, Exploratory Data Analysis, and went on to develop the theory as it moved into formalized use.

EDA helps analysts to make sense of the data that they have, then figure out what questions to ask about it and how to frame them, as well as the best ways to manipulate available data sources to get the answers they need.

EDA Tools and Techniques

Using quantitative techniques and visual methods of representation, EDA looks to tell a story about your existing data based on a broad examination of patterns, trends, outliers, and unexpected results. Observations recorded during exploratory data analysis should suggest the next steps you logically take, the questions you’ll ask, and your possible areas of research.

At a higher level, data scientists use the visual and quantitative methods of EDA to understand and summarize a dataset, without making any assumptions about what it contains. This analysis may be a precursor to lines of investigation deploying more sophisticated statistical modeling or machine learning techniques. Exploratory data analysis typically involves a combination of mathematical/statistical methods with visual models used to represent the results as appropriate.

Univariate analysis may be used to describe types of data which consist of observations on only a single characteristic or attribute. Scientists may conduct this analysis on each field in the raw dataset with summary statistics. The output could resemble the figure below:

Exploratory Data Analysis (EDA)

(Image source: svds.com)

 

Bivariate analysis looks at two variables (often denoted as X, Y), for the purpose of determining the empirical relationship between them. Bivariate visualizations and summary statistics enable data scientists to assess the relationship between each variable in a dataset and the target variables they’re currently looking at. A typical plot would look like this:

(Image source: svds.com)

 

When two or more variable quantities are relevant to an investigation, multivariate visualizations and analysis can map the interactions between different fields in the data, typically yielding graphical results like the figure below:

(Image source: svds.com)

 

Dimensionality reduction enables analysts to understand the fields in their data which account for the most variance between observations and allows for the processing of a reduced overall volume of data.

Similar observations in a dataset may be assigned to differentiated groupings through a process known as clustering, which collapses the data into a few small data points allowing patterns of behavior to be more easily identified. For example, K-Means clustering creates “centers” for each cluster based on the nearest average. A clustered distribution of data might look like this:

(Image source: svds.com)

 

Technology and Software

Since its “formal” introduction in the 1970s, EDA has given birth to its own generation of statistical programming tools and software. S-Plus and R are among the most commonly used statistical programming packages for conducting exploratory data analysis. R, in particular, is a powerful and versatile open-source programming language that can be integrated with many business intelligence platforms.

With the appropriate data connectors, you can incorporate EDA data directly into your BI software, acting as a two-way analysis “bridge”. Besides performing initial analysis, statistical models built and run from an EDA package can tap into existing business intelligence data and automatically update as new information flows into a model. As an example, you might use EDA technology to map the lead-to-cash process across your full range of transactions and departments as an aid in streamlining and facilitating the conversion of prospects to actual buyers.

Putting EDA into Context

Exploratory data analysis is primarily about getting to know and understand your data before you make any assumptions about it. It’s an important step in avoiding the risk of building inaccurate business models based on inaccurate information or following up on strategies founded on wrongful assumptions.

During EDA, various technical assumptions are usually assessed to help select the best model for the data and the work ahead. EDA technology helps during the feature engineering stage by suggesting relationships and how they might be efficiently incorporated into a model. A model based on EDA also guards against poor predictions and incorrect conclusions that could have negative consequences for an organization.

Assumptions based on flawed business logic are typically harder to detect – and are often deeply ingrained with the problem and how it’s initially presented. As a best practice, a data scientist will systematically assess the contents of each data field and its interactions with other variables, especially those key metrics which represent behaviors that the business wants to understand or predict.

Ultimately, exploratory data analysis gives the analyst an opportunity to get acquainted with the available data and develop an intuition for what it contains. EDA technologies and techniques allow for the easier identification of glaring errors and more subtle discrepancies that could have unfortunate results later on. This empowers data scientists to ensure that the results they produce are valid, applicable to the desired business objectives, and correctly interpreted.

 

 

February 14, 2020

Azure Arc:  Cross-Cloud Platform Management Made Easy(er)

Azure Arc: Cross-Cloud Platform Management Made Easy(er)

Azure Arc is an exciting new cloud management service from Microsoft. It paves the way for organizations to simplify the management and governance of hybrid cloud and micro-service architectures. For example, it can enable an organization to enforce security best-practices across resources that are housed on-premises, on Azure, or even on GCP and AWS. Azure Arc is currently in preview, and the pricing structure of this service has yet to be released.

 

A look at competing AWS and GCP services reveals a key opportunity for Microsoft. As the current cloud leader, AWS has less incentive to enable cloud customers to use other vendors’ services and their AWS Outpost service offers a comparatively limited range of capabilities, along with a starting price of ~$80K per year. And, with pricing starting at ~$120K per year GCP’s Anthos service is similarly under-whelming for most small and midsize businesses. If Azure Arc is priced right, it could reduce switching costs and enable organizations to more easily transition and scale services between various cloud-services providers, further accelerating Azure’s 60%+ annual growth rate.

 

Stay tuned for more updates.

 

Sources:

 

 

Keith Bocian is a data and computer science geek employed as a full-time Data Scientist with Expeed Software. He’s certified AWS Machine Learning – Specialty, an Azure Data Scientist Associate, and an AWS Solutions Architect Associate. Keith is a decorated military veteran.

January 21, 2020

Test Automation Vs. Automated Testing: What’s the Difference and Why Does It Matter?

Test Automation Vs. Automated Testing: What’s the Difference and Why Does It Matter?

You say to-mah-to, I say to-may-to. Can we call the whole thing off? Of course we can. Because no matter how we each pronounce the word, both you and I are referring to the same thing – a plump red fruit that tastes good in salads and on sandwiches. There’s no real dispute here – we’ve just got different accents, and so we enunciate certain words differently. Where we might genuinely disagree as we tuck into our lunch, however, is on the differences between test automation and automated testing. In some software development circles, these two terms get bandied about as if they can be used interchangeably. They can’t. Like tomatoes and apples, there are differences between the two – and differences are important. A tomato is a round red fruit with seeds inside. So is an apple. But no chef worth her salt would use them interchangeably when making soup or ketchup.

 

Continuous Testing

Before we get into the distinctions between test automation and automated testing, we first need some context – and that context is continuous testing. Continuous testing is a DevOps practice of evaluating quality at every stage of the software development and delivery process. It is an overarching software testing strategy that involves testing early, testing often, testing everywhere, and – importantly – automation.

The primary purpose of continuous testing in the software development and delivery pipeline is to obtain feedback on the business risks associated with a software release candidate (i.e., a software change or update) as quickly as possible. Continuous testing is crucial to modern business practices for the simple reason that most companies are under constant pressure to continuously adapt software in line with user feedback, shifts in the market, and changes to business strategy. This has given rise to the continuous delivery model – i.e., where software is constantly in development and must always be ready for deployment. However, in the digital economy where websites, apps, and other software have become the primary interface between business and customer, an application failure is a business failure. Even a seemingly minor glitch can have severe repercussions if it impacts the user/customer experience (customer experience, of course, is fast becoming the number one brand differentiator). As such, continuous testing is required to discover problems as early as possible in the software development process, and thereby mitigate business risks when releasing software to production.

 

continuous testing definition

(Image source: tricentis.com)

 

Continuous testing differs from traditional methods of software testing in the very sense that it is continuous – i.e., undisrupted and carried out on a persistent basis. In traditional environments, testing only gets completed in one big bang at the end of a software development cycle. However, as continuous delivery models have gained prominence in the rush to ensure releases go to market as quickly as possible, running tests late in the software development process exposes teams to the risk of only discovering problems at a very late stage. This can be a total game-changer – the team may be forced to go back to the drawing board at the last minute, causing a massive roadblock in the release of new developments, and defeating the very purpose of the continuous delivery model.

Continuous testing is the answer. By making testing continuous and integrating it into every stage of the build process, issues are identified almost immediately and can, therefore, be rectified much sooner – closing the circle to ensure fast and successful continuous delivery.

In addition, it must also be remembered that as each new release adds new features to the software (which must be continuously tested), they also cause changes that can break the software’s existing functionality. So, testing existing functionality (i.e., regression testing) has to be continuous, too.

However, many companies are working with limited resources and limited time to execute all of these tests – and at some point, they are bound to fall behind. And this is where both automated testing and test automation come in.

 

Automated Testing Vs. Test Automation

For testing to be continuous with limited resources, some tests (as many as possible) need to be automated – and there also needs to be a way to track and manage all the different tests that are being performed (both manually and via automation).

And here is where we get the distinction between automated testing and test automation.

Automated testing is the process of conducting specific tests – such as regression tests or unit tests – using automation tools rather than doing them manually. Test automation, on the other hand, refers to automating the process of tracking, managing, and initiating the different tests. In this way, test automation doesn’t always refer to the tests themselves (be they automated or otherwise), but rather how software development professionals manage these tests throughout the continuous delivery pipeline.

 

Automated Testing

The main benefit of automated testing is that it simplifies – or even removes – as much of the manual effort of testing as possible, while also speeding up the process. Unit testing and in particular, regression testing are often considered good candidates for automated testing because they tend to consume large amounts of a software development team’s time and resources. In fact, some software professionals think of automated testing as “automated regression testing” because they don’t consider regression testing real testing at all – rather the laborious and repetitive process of rechecking existing functionality.

Automated testing can boost a software development team’s efficiency. Automated tests can run repeatedly – even continuously – at any time of day and result in higher accuracy (less chance of human error – particularly as testing the same functionality over and over again can lead to routine-blindness), greater coverage and earlier bug detection. These tests can also be performed in a faster timeframe than a human tester, enabling staff to focus on other project priorities.

 

Test Automation

Naturally, managing all of the testing needs in a continuous testing environment is no easy undertaking. It requires extensive efforts in terms of communication between stakeholders to keep track of where new code has been deployed, when each piece needs testing, and how it all feeds back into the constantly moving process of continuous delivery.

Test automation is the answer. By automating the tracking and managing of all testing requirements – including the extent of existing testing coverage and what other types of testing might still be required to increase that coverage – test automation ensures that testing is managed effectively and that software development teams maintain a high standard of quality at all stages of the continuous delivery pipeline.

In essence, test automation is about generating test cases – i.e., what to test and when to test it – automatically, and then scheduling test runs to execute those tests. Test automation tools are designed to flag up a list of items for which test cases need to be created – bringing new needs to software testers’ attention automatically – and can also actually initiate an automated test. In addition, these tools track the progress of each test case to completion, and automatically separate and categorize test cases to make it easy for users to ensure there is proper coverage across each stage of the development lifecycle.

 

Final Thoughts

To summarize, with continuous testing, software development teams aim to reduce business risk by testing early, testing faster, testing often, and automating. To be continuous, tests must integrate seamlessly into the continuous software delivery pipeline to provide actionable feedback which is appropriate for each stage of development. Test automation tools are designed to automatically trigger code testing at each stage. As such, the term doesn’t refer to the tests themselves – rather, the process of managing, tracking, and initiating the various tests. Actual tests can be either manual or automated – and when they are automated, the process is referred to as automated testing.

October 23, 2019

6 Ways Data Science and IT Developers Can Work in Tandem

6 Ways Data Science and IT Developers Can Work in Tandem

Data science empowers software engineers and IT developers to extract meaningful insights from the processes and information they encounter during application development. But it also involves communicating the results of data analysis to other stakeholders in a software project – many of whom won’t have the technical level of understanding or expertise of an IT administrator or professional programmer.

So, at all stages of the software lifecycle, it’s necessary for data scientists and developers to be able to work together, understand each other’s points of view, and communicate effectively.

Data Science in Software Development

When developers are incubating their initial ideas for a new piece of software or an IT system, data science can be there to explore the ramifications and likely outcomes  – outcomes like incorporating a particular feature, or of how one functionality plays off against another, or even what data will be produced or can be leverage.

During programming and testing, the work of data scientists helps in collating results and making sense out of the figures of merit. Coupled with appropriate visualization techniques, data science can mold these results and insights from streams of numbers into stories that can be leveraged by financiers, marketers, sales personnel, and other stakeholders in the software ecosystem that hail from non-technical or non-computer related backgrounds. The user base for a particular system or product will likely also include people from a range of cultural and educational backgrounds.

For stakeholders, data scientists can provide tangible evidence of how revenue and business value are being generated, and insight into how and where actions are necessary to sustain good levels of performance or to make improvements.

Clearly, there’s a role for data science throughout the software development lifecycle – so it makes operational and economic sense to have data scientists and developers working together amicably at all stages of the process.

Finding A Common Language or Environment

Harmony and collaboration between software engineers and data scientists may be desirable, but it’s not necessarily that easy to achieve. In part, this derives from actual differences in the way the two disciplines operate, and perceptions that the two groups may have about the way their counterparts think and work. This graphic from CodeMentor sums it up neatly:

(Image source: codementor.io)

You’ll notice that big data frameworks are a tool common to both disciplines – and in nearly all industries, data-driven intelligence is now an essential part of day-to-day operations, whether it’s used for supply chain management or personalized marketing. It therefore makes sense to establishing and use a shared set of tools and languages for developers and data scientists.

Setting Up A Data Lake

The construction of a data lake is one way of making production data from the development process readily available to data scientists and software engineers alike. This lake is a common pool of information, set up in an environment separate from the production platform. Because it will be a repository for information generated throughout the lifecycle, the data lake must have the potential to store vast quantities of records – so a dedicated data center or cloud environment is best.

The data scientists will decide on the best way for information to be stored and optimized for the queries they expect to run in the near term, and as future ideas develop. Since much of the data will come from the main application that the developers are working on, the teams will need to collaborate on finding the best ways for data to flow into the lake in its raw form.

This design process should take into account factors like the data, schema, level of data compression (if any), and whether information flows will stream in real-time or enter via scheduled dumps. Each team’s level of responsibility for monitoring data flows should also be established.

Making the Right Tools Available

Creating a common environment for developers and data scientists requires tools that enable them to work on the same data sets simultaneously, writing and sharing code or rich text. Notebooks make this possible.

For online operations, open source platforms like PixieDust (a helper library for Jupyter notebooks) enable developers to explore data analysis models without having to learn or code in the statistical languages favored by data science. Originally created by data scientists as one-off scripts, Jupyter notebooks also allow for the offline analysis of data sets and algorithms.

Monitoring and Evaluation

Throughout the software development lifecycle, data science algorithms must trace the path from raw data to interpreted information to some kind of value. Both the work of the data scientists and the developers has to be assessed and observed at all stages. And this observation and evaluation need to be built into the development environment from the beginning.

The very process of setting up this scenario creates opportunities for collaboration in and of itself. On the one hand, the software engineers get a chance to build a framework that embeds the work of the data scientists in a pipeline combining various datasets and algorithms. On the other, data scientists play an integral part in its construction by setting parameters and framing the right kinds of questions.

Using Data Scientists and Developers in Cross-Functional Teams

The final piece of the collaboration puzzle comes with the formation of cross-functional teams consisting of representatives from both camps.

For one thing, having data scientists embedded in a development team (or developers attached to a data science unit) fosters debate and active communication between the members. It also promotes understanding, allowing software engineers and data scientists to better appreciate the needs of each other. In addition, having mixed groups of professionals within the same unit enables those practitioners to step in immediately with their particular skill sets if issues or opportunities arise.

Business units with experience of working with cross-functional teams also stress the importance of allowing a degree of flexibility for the data science members (who may occasionally need to branch out and explore particular topics in isolation for a while), and of creating a forum where various teams can meet to share ideas and knowledge.

At the end of the day, the aim is to enable data science professionals and software developers to use their unique skills to the best advantage of the team – in an environment that promotes creativity, and where trust and respect can build up between the team members as new knowledge and insights are acquired and invested back into the product.

August 15, 2019

Storytelling with Data: Using Visualizations to Help Stakeholders Understand Data Findings

Storytelling with Data: Using Visualizations to Help Stakeholders Understand Data Findings

The job of the data scientist is to acquire data, clean it, analyze it, make sense of it, and most crucially of all, communicate its meaning to an audience of (usually) non-data scientists. Effective communication is critical in data science.

Data scientists are highly-trained, highly-skilled individuals – and they need to be. The language of data science is highly complex and esoteric. Data modeling and analysis is complicated, and datasets are difficult to understand – especially for non-technical people. Furthermore, as a data scientist works with data, that data is sometimes stored as comma-separated values (CSV) files, Excel files, or otherwise in [No]SQL databases, the Hadoop Distributed File System (HDFS), and the like. As with all data science work, it’s not how the data is stored – nor the actual data itself – that is inherently valuable. Rather, the value is found in the insights that can be drawn from it.

Data isn’t easy to decipher– especially when dealing with large volumes of the stuff. This is where data visualization comes in. Data visualization is the process of presenting data in a visual context – using pictures to understand data, in other words. This is vital even for the data scientist to comprehend, let alone for effective communication between the data science team and relevant stakeholders. Indeed, for data scientists to produce truly actionable insights, the findings and observations have to be made available to the stakeholders tasked with acting on them.

Data visualization enables different stakeholders and decision-makers to understand the significance of data by presenting it not as volumes and volumes of records, but in easy-to-interpret graphs, charts, maps, dashboards, and other visualizations. Visualizations provide a consumable way to see and understand trends, patterns, correlations and outliers in data. In this way, data visualizations don’t only reveal insights – they help make those insights actionable.

 

Why Is Data Visualization So Effective in Data Science?

The simple answer to this question is because of the way the human brain processes information. Since data science business projects usually involve a lot of information to process, the human brain often is unable to handle the volume. According to Dell EMC, organizations managed an average of 9.7 petabytes of data in 2018, a 569% increase compared with the 1.45 petabytes they handled in 2016. With so much data, it’s practically impossible to wade through it all line-by-line and pick out patterns and trends – but with data visualization tools and techniques, insights are much easier to see and grasp. The reason is that our brains process visual information much faster than text-based information – 60,000 times faster, in fact, according to estimates. Here’s a visual to help that data sink in quicker…

(Image source: killervisualstratergies.com)

 

…And here’s an example exercise from Study.com to prove the point.

Question: Looking at the following table, what month recorded the highest sales?

Obviously, it’s December – but it took you a few seconds to read through the figures to find the answer. By comparison, look at this simple visual representation of the same data and ask yourself the same question…

(Image source: study.com)

 

… You got the answer almost instantaneously, right? And you can also see the peaks and troughs throughout the year – the larger story the data tells.

We all know that time is money in business. Organizations that can make better sense of their data quicker are more competitive in the marketplace. Why? Because they can see trends, patterns, and make informed, evidence-based decisions sooner than their rivals. Data visualization helps this to happen.

Consider a marketing team working across 20 or more ad and social media platforms. The team needs to know the effectiveness of its various campaigns so it can optimize spend and targeting – and it needs this information quickly to remain competitive. The process could be completed manually by going into each system, pulling out the various reports, combining the data, and then analyzing on a spreadsheet – but it would take an age to pore through all the metrics and draw any meaningful conclusions. However, utilizing data visualization tools, all sources of data can be automatically connected, and visualizations immediately produced to be presented to the team, allowing its members to draw on-the-spot comparisons and conclusions about each campaign’s performance.

 

It’s All About Fast and Clear Communication

There are many different types of visualizations – line plots, scatter plots, histograms, box plots, bar charts, the list goes on. They may seem simple – but it is precisely this simplicity that makes them so valuable when presenting data science findings to stakeholders.

(Image source: blog.qlik.com – click to access this chart)

 

In a recent interview with DataquestKristen Sosulski, Clinical Associate Professor of Information, Operations, and Management Sciences at New York University Stern School of Business – and author of Data Visualization Made Simple– makes the point that while very few people can look at a spreadsheet and draw quick and accurate conclusions about what the data says, anyone can compare the size of bars on a bar chart, or follow the trend on a line graph.

Sosulski explains that while data visualization is a key skill at every stage of the data science process, it becomes critical at the point of communication. “There are a lot of angles that you can take with visualization, and ways to look at it,” she says. “I think about data visualization as something that we have in the toolkit to help people better understand our insights and our data. Just on a human level, visualizations allow us to perceive information a lot more clearly when they’re well designed.”

 

Conclusion

Using visual representations, data scientists can open the eyes of key stakeholders and decision-makers, allowing them to understand clearly and quickly what a dataset is revealing, how a model will help solve a business problem, and what impact the scientist’s proposals and discoveries will have on the organization. Emerging trends – both in the business and in the market – can be pinpointed quickly, outliers spotted, relationships and patterns identified, and the whole data story communicated engagingly in a way that gets the message across without unnecessary delay.

Without visualizations, all you have is data on a spreadsheet. All insights remain buried. The beauty of data science is that it reveals the true value of all those petabytes of data organizations are now managing. But without using data visualizations to communicate the important insights a data science project discovers, that value will be forever lost.

August 2, 2019

Disrupting the Cloud with Serverless

Disrupting the Cloud with Serverless

 

Serverless may indeed be the new black. It made Gartner’s 2019 list for top 10 trends in Infrastructure and Operations. An annual growth rate of 75 percent qualified serverless as the fastest-growing cloud service model in RightScale’s 2018 State of the Cloud report. And deployment of serverless computing technologies among global enterprises is expected to hit 20 percent by 2020, up from the current 5 percent.

Despite the nomenclature, computing will continue to involve servers. Developers, however, will no longer have to be involved with provisioning, deploying and monitoring servers. Those administrative tasks will now be handled by a new pedigree of services such as AWS Lambda, Google Cloud Functions, Microsoft Azure Functions, and IBM OpenWhisk.

 

Serverless, FaaS & BaaS

Serverless broadly refers to applications that circumvent the need for an always-on server component by using third-party hosted services to manage server-side logic and state. There are currently two types of serverless deployments; Functions-as-a-service (FaaS), provided by the new pedigree of services mentioned above, or Backend-as-a-service (BaaS) providers like Google Firebase, AWS DynamoDB, BaaSBox, and Backendless.

BaaS provides a complete online backend service, such as storage, social networking integration, or locations services, so that developers don’t have to develop another backend for each service that the applications use or access.

FaaS, on the other hand, is a serverless approach to application development, that enables developers to write and deploy code without having to manage their own infrastructure. The developer defines events and sets triggers and the service provider ensures that the right amount of infrastructure is delivered to execute the code. The code is executed only when backend functions are invoked by events triggered by users, dies once the task is completed, and the customer is charged only for the duration of the execution.

SOURCE: https://www2.deloitte.com/content/dam/Deloitte/tr/Documents/technology-media-telecommunications/Serverless%20Computing.pdf

 

The serverless model represents a significant shift even from the traditional cloud service model where customers had to reserve and pay for a predetermined amount of bandwidth or server space, irrespective of usage. It is therefore easy to see the value in a model that delivers on-demand execution, dynamic scaling based on application loads and pay-per-use pricing.

As a fully-managed service, serverless ensures that all infrastructure related issues are shifted from the developer to the vendor. Let’s now take a look at some key benefits of the serverless model.

 

Benefits of serverless

Efficient utilization: The serverless model ensures infinite scalability and built-in high availability and the pay-per-use feature eliminates idle time as customers only pay when functions are executed. Dynamic provisioning ensures that the infrastructure scales automatically based on application loads, making it especially attractive in the case of applications with wildly inconsistent usage patterns.

 

Enhance developer productivity: Freed from the demands of infrastructure management, developers can focus fully on the functions that they are expected to deliver without having to worry about server capacity. Every function that comprises the applications can be updated independently, or all at one go, without having to change the entire application. This makes it much quicker and easier to update, patch, fix or add new features to an application. Serverless also accelerates go-to-market times by drastically simplifying the process of writing, testing, and deploying code.

 

Reduced latency: A serverless architecture makes it possible for developers to reduce latency by running code closer to the end user. Since application functions are not tied to a centralized origin server, they can now be shifted to servers closer to the end user to decrease latency.

 

Improved security: It is also argued that constraining the developer to using only code constructs that work within the serverless context can produce code that is aligned with security, governance, and best practice protocols.

In addition to all this, there is also the idea floated by Gartner that serverless computing can improve the productivity of infrastructure and operations (I&O) administrators. According to the analyst, the event-driven architecture of FaaS is perfectly suited to automate cloud infrastructure operations.

 

SOURCE: https://www2.deloitte.com/content/dam/Deloitte/tr/Documents/technology-media-telecommunications/Serverless%20Computing.pdf

 

Notwithstanding all these advantages, there are still some key challenges associated with serverless that may not make it the ideal choice for all applications and situations.

Key serverless challenges

Development challenges: The programming approach for serverless applications can be quite different from even that of traditional cloud services. Moreover, there is currently a lot of variation in the programming languages and frameworks supported by different serverless service provider. Then there are limitations imposed by some vendors on code size, memory, etc., all of which seriously limit the type of applications that can be built for this environment.

Testing and debugging can also be challenging in this model. The primary reason for this is the dearth of testing tools for developers that can exactly emulate cloud events in a local development environment. Even debugging gets complicated because of the limited visibility into backend processes.

 

Performance challenges: A traditional always-on server can process user queries instantly. Since serverless code is shut down after a function has been invoked and the task is completed, it has to be booted up again the next time it is triggered. Functions that have been inactive for a while will then have to be cold started, which can slow down response times and increase latency. Hence, this model may not be suitable for applications that are used rarely yet still need to respond extremely quickly to events. This ephemerality of the service, combined with the resource limitations imposed by vendors, can render serverless unsuitable for applications with larger processing requirement.

 

Security challenges: There have been several critical security risks that have been identified for serverless applications, primary among these being vendor security and multi-tenancy. With vendors taking complete control of the backend where the actual execution occurs, developers have little visibility into the security protocols in place. This can be of particular concern with respect to applications that use sensitive data. Multi-tenancy raises similar concerns as serverless applications in all likelihood run on shared infrastructure and can result in data being exposed.

Finally, there is the challenge of vendor lock-in. With every vendor offering their own combination of programming languages, workflows and features, it can become extremely cumbersome to migrate applications from one platform to another.

Serverless represents the latest in a long trend of abstractions that have transitioned enterprise IT from physical servers to virtual machines and containers. Now, infrastructure itself is being abstracted away.

Conclusion

Today, serverless is the hottest cloud computing trend with the Microsoft CEO declaring it the core of the future of distributed computing. But serverless is still a nascent technology with its promise of cost transparency, agility, and productivity being counterbalanced by several significant challenges that still have to be addressed. Yet, adoption predictions are through the roof, with the rider that as extraordinary as serverless may be it may just not be appropriate for every use case.

June 6, 2019

© 2020 Expeed Software