WHITE PAPER –  The What, Why, And How Of Service Meshes

WHITE PAPER – The What, Why, And How Of Service Meshes

 

Service meshes are a nascent technological framework with a lot of potential to transform the adoption and deployment of microservices and containers. Neither the framework nor the market for that matter is currently ready for prime time. But even as the concept gradually evolves into the mainstream, here is a primer on the what, why, and how of service meshes.

 

Enter info to read now.

March 4, 2020

A Brief Exploration of Exploratory Data Analysis (EDA)

A Brief Exploration of Exploratory Data Analysis (EDA)

You probably wouldn’t buy a new car before checking out some online reviews, reading up on its specs, and taking it for a test run. In a similar manner, it would be unwise to make critical business decisions on the basis of information and assumptions that haven’t been screened or tested in some way. That’s what exploratory data analysis is all about.

What Is Exploratory Data Analysis?

In essence, exploratory data analysis or EDA is a way of getting an overview of the quality and nature of the information available before you begin studying it in more detail. In the context of business intelligence (BI), EDA involves conducting initial investigations on data to discover existing patterns, spot anomalies, test out theories or hypotheses concerning the information and check out the validity of any assumptions made about the data prior to analysis.

The concept isn’t really a new one. Cautious investigators have been making use of the principal for decades, if not centuries. But in 1977, John W. Tukey coined the phrase in his book, Exploratory Data Analysis, and went on to develop the theory as it moved into formalized use.

EDA helps analysts to make sense of the data that they have, then figure out what questions to ask about it and how to frame them, as well as the best ways to manipulate available data sources to get the answers they need.

EDA Tools and Techniques

Using quantitative techniques and visual methods of representation, EDA looks to tell a story about your existing data based on a broad examination of patterns, trends, outliers, and unexpected results. Observations recorded during exploratory data analysis should suggest the next steps you logically take, the questions you’ll ask, and your possible areas of research.

At a higher level, data scientists use the visual and quantitative methods of EDA to understand and summarize a dataset, without making any assumptions about what it contains. This analysis may be a precursor to lines of investigation deploying more sophisticated statistical modeling or machine learning techniques. Exploratory data analysis typically involves a combination of mathematical/statistical methods with visual models used to represent the results as appropriate.

Univariate analysis may be used to describe types of data which consist of observations on only a single characteristic or attribute. Scientists may conduct this analysis on each field in the raw dataset with summary statistics. The output could resemble the figure below:

Exploratory Data Analysis (EDA)

(Image source: svds.com)

 

Bivariate analysis looks at two variables (often denoted as X, Y), for the purpose of determining the empirical relationship between them. Bivariate visualizations and summary statistics enable data scientists to assess the relationship between each variable in a dataset and the target variables they’re currently looking at. A typical plot would look like this:

(Image source: svds.com)

 

When two or more variable quantities are relevant to an investigation, multivariate visualizations and analysis can map the interactions between different fields in the data, typically yielding graphical results like the figure below:

(Image source: svds.com)

 

Dimensionality reduction enables analysts to understand the fields in their data which account for the most variance between observations and allows for the processing of a reduced overall volume of data.

Similar observations in a dataset may be assigned to differentiated groupings through a process known as clustering, which collapses the data into a few small data points allowing patterns of behavior to be more easily identified. For example, K-Means clustering creates “centers” for each cluster based on the nearest average. A clustered distribution of data might look like this:

(Image source: svds.com)

 

Technology and Software

Since its “formal” introduction in the 1970s, EDA has given birth to its own generation of statistical programming tools and software. S-Plus and R are among the most commonly used statistical programming packages for conducting exploratory data analysis. R, in particular, is a powerful and versatile open-source programming language that can be integrated with many business intelligence platforms.

With the appropriate data connectors, you can incorporate EDA data directly into your BI software, acting as a two-way analysis “bridge”. Besides performing initial analysis, statistical models built and run from an EDA package can tap into existing business intelligence data and automatically update as new information flows into a model. As an example, you might use EDA technology to map the lead-to-cash process across your full range of transactions and departments as an aid in streamlining and facilitating the conversion of prospects to actual buyers.

Putting EDA into Context

Exploratory data analysis is primarily about getting to know and understand your data before you make any assumptions about it. It’s an important step in avoiding the risk of building inaccurate business models based on inaccurate information or following up on strategies founded on wrongful assumptions.

During EDA, various technical assumptions are usually assessed to help select the best model for the data and the work ahead. EDA technology helps during the feature engineering stage by suggesting relationships and how they might be efficiently incorporated into a model. A model based on EDA also guards against poor predictions and incorrect conclusions that could have negative consequences for an organization.

Assumptions based on flawed business logic are typically harder to detect – and are often deeply ingrained with the problem and how it’s initially presented. As a best practice, a data scientist will systematically assess the contents of each data field and its interactions with other variables, especially those key metrics which represent behaviors that the business wants to understand or predict.

Ultimately, exploratory data analysis gives the analyst an opportunity to get acquainted with the available data and develop an intuition for what it contains. EDA technologies and techniques allow for the easier identification of glaring errors and more subtle discrepancies that could have unfortunate results later on. This empowers data scientists to ensure that the results they produce are valid, applicable to the desired business objectives, and correctly interpreted.

 

 

February 14, 2020

Azure Arc:  Cross-Cloud Platform Management Made Easy(er)

Azure Arc: Cross-Cloud Platform Management Made Easy(er)

Azure Arc is an exciting new cloud management service from Microsoft. It paves the way for organizations to simplify the management and governance of hybrid cloud and micro-service architectures. For example, it can enable an organization to enforce security best-practices across resources that are housed on-premises, on Azure, or even on GCP and AWS. Azure Arc is currently in preview, and the pricing structure of this service has yet to be released.

 

A look at competing AWS and GCP services reveals a key opportunity for Microsoft. As the current cloud leader, AWS has less incentive to enable cloud customers to use other vendors’ services and their AWS Outpost service offers a comparatively limited range of capabilities, along with a starting price of ~$80K per year. And, with pricing starting at ~$120K per year GCP’s Anthos service is similarly under-whelming for most small and midsize businesses. If Azure Arc is priced right, it could reduce switching costs and enable organizations to more easily transition and scale services between various cloud-services providers, further accelerating Azure’s 60%+ annual growth rate.

 

Stay tuned for more updates.

 

Sources:

 

 

Keith Bocian is a data and computer science geek employed as a full-time Data Scientist with Expeed Software. He’s certified AWS Machine Learning – Specialty, an Azure Data Scientist Associate, and an AWS Solutions Architect Associate. Keith is a decorated military veteran.

January 21, 2020

Test Automation Vs. Automated Testing: What’s the Difference and Why Does It Matter?

Test Automation Vs. Automated Testing: What’s the Difference and Why Does It Matter?

You say to-mah-to, I say to-may-to. Can we call the whole thing off? Of course we can. Because no matter how we each pronounce the word, both you and I are referring to the same thing – a plump red fruit that tastes good in salads and on sandwiches. There’s no real dispute here – we’ve just got different accents, and so we enunciate certain words differently. Where we might genuinely disagree as we tuck into our lunch, however, is on the differences between test automation and automated testing. In some software development circles, these two terms get bandied about as if they can be used interchangeably. They can’t. Like tomatoes and apples, there are differences between the two – and differences are important. A tomato is a round red fruit with seeds inside. So is an apple. But no chef worth her salt would use them interchangeably when making soup or ketchup.

 

Continuous Testing

Before we get into the distinctions between test automation and automated testing, we first need some context – and that context is continuous testing. Continuous testing is a DevOps practice of evaluating quality at every stage of the software development and delivery process. It is an overarching software testing strategy that involves testing early, testing often, testing everywhere, and – importantly – automation.

The primary purpose of continuous testing in the software development and delivery pipeline is to obtain feedback on the business risks associated with a software release candidate (i.e., a software change or update) as quickly as possible. Continuous testing is crucial to modern business practices for the simple reason that most companies are under constant pressure to continuously adapt software in line with user feedback, shifts in the market, and changes to business strategy. This has given rise to the continuous delivery model – i.e., where software is constantly in development and must always be ready for deployment. However, in the digital economy where websites, apps, and other software have become the primary interface between business and customer, an application failure is a business failure. Even a seemingly minor glitch can have severe repercussions if it impacts the user/customer experience (customer experience, of course, is fast becoming the number one brand differentiator). As such, continuous testing is required to discover problems as early as possible in the software development process, and thereby mitigate business risks when releasing software to production.

 

continuous testing definition

(Image source: tricentis.com)

 

Continuous testing differs from traditional methods of software testing in the very sense that it is continuous – i.e., undisrupted and carried out on a persistent basis. In traditional environments, testing only gets completed in one big bang at the end of a software development cycle. However, as continuous delivery models have gained prominence in the rush to ensure releases go to market as quickly as possible, running tests late in the software development process exposes teams to the risk of only discovering problems at a very late stage. This can be a total game-changer – the team may be forced to go back to the drawing board at the last minute, causing a massive roadblock in the release of new developments, and defeating the very purpose of the continuous delivery model.

Continuous testing is the answer. By making testing continuous and integrating it into every stage of the build process, issues are identified almost immediately and can, therefore, be rectified much sooner – closing the circle to ensure fast and successful continuous delivery.

In addition, it must also be remembered that as each new release adds new features to the software (which must be continuously tested), they also cause changes that can break the software’s existing functionality. So, testing existing functionality (i.e., regression testing) has to be continuous, too.

However, many companies are working with limited resources and limited time to execute all of these tests – and at some point, they are bound to fall behind. And this is where both automated testing and test automation come in.

 

Automated Testing Vs. Test Automation

For testing to be continuous with limited resources, some tests (as many as possible) need to be automated – and there also needs to be a way to track and manage all the different tests that are being performed (both manually and via automation).

And here is where we get the distinction between automated testing and test automation.

Automated testing is the process of conducting specific tests – such as regression tests or unit tests – using automation tools rather than doing them manually. Test automation, on the other hand, refers to automating the process of tracking, managing, and initiating the different tests. In this way, test automation doesn’t always refer to the tests themselves (be they automated or otherwise), but rather how software development professionals manage these tests throughout the continuous delivery pipeline.

 

Automated Testing

The main benefit of automated testing is that it simplifies – or even removes – as much of the manual effort of testing as possible, while also speeding up the process. Unit testing and in particular, regression testing are often considered good candidates for automated testing because they tend to consume large amounts of a software development team’s time and resources. In fact, some software professionals think of automated testing as “automated regression testing” because they don’t consider regression testing real testing at all – rather the laborious and repetitive process of rechecking existing functionality.

Automated testing can boost a software development team’s efficiency. Automated tests can run repeatedly – even continuously – at any time of day and result in higher accuracy (less chance of human error – particularly as testing the same functionality over and over again can lead to routine-blindness), greater coverage and earlier bug detection. These tests can also be performed in a faster timeframe than a human tester, enabling staff to focus on other project priorities.

 

Test Automation

Naturally, managing all of the testing needs in a continuous testing environment is no easy undertaking. It requires extensive efforts in terms of communication between stakeholders to keep track of where new code has been deployed, when each piece needs testing, and how it all feeds back into the constantly moving process of continuous delivery.

Test automation is the answer. By automating the tracking and managing of all testing requirements – including the extent of existing testing coverage and what other types of testing might still be required to increase that coverage – test automation ensures that testing is managed effectively and that software development teams maintain a high standard of quality at all stages of the continuous delivery pipeline.

In essence, test automation is about generating test cases – i.e., what to test and when to test it – automatically, and then scheduling test runs to execute those tests. Test automation tools are designed to flag up a list of items for which test cases need to be created – bringing new needs to software testers’ attention automatically – and can also actually initiate an automated test. In addition, these tools track the progress of each test case to completion, and automatically separate and categorize test cases to make it easy for users to ensure there is proper coverage across each stage of the development lifecycle.

 

Final Thoughts

To summarize, with continuous testing, software development teams aim to reduce business risk by testing early, testing faster, testing often, and automating. To be continuous, tests must integrate seamlessly into the continuous software delivery pipeline to provide actionable feedback which is appropriate for each stage of development. Test automation tools are designed to automatically trigger code testing at each stage. As such, the term doesn’t refer to the tests themselves – rather, the process of managing, tracking, and initiating the various tests. Actual tests can be either manual or automated – and when they are automated, the process is referred to as automated testing.

October 23, 2019

6 Ways Data Science and IT Developers Can Work in Tandem

6 Ways Data Science and IT Developers Can Work in Tandem

Data science empowers software engineers and IT developers to extract meaningful insights from the processes and information they encounter during application development. But it also involves communicating the results of data analysis to other stakeholders in a software project – many of whom won’t have the technical level of understanding or expertise of an IT administrator or professional programmer.

So, at all stages of the software lifecycle, it’s necessary for data scientists and developers to be able to work together, understand each other’s points of view, and communicate effectively.

Data Science in Software Development

When developers are incubating their initial ideas for a new piece of software or an IT system, data science can be there to explore the ramifications and likely outcomes  – outcomes like incorporating a particular feature, or of how one functionality plays off against another, or even what data will be produced or can be leverage.

During programming and testing, the work of data scientists helps in collating results and making sense out of the figures of merit. Coupled with appropriate visualization techniques, data science can mold these results and insights from streams of numbers into stories that can be leveraged by financiers, marketers, sales personnel, and other stakeholders in the software ecosystem that hail from non-technical or non-computer related backgrounds. The user base for a particular system or product will likely also include people from a range of cultural and educational backgrounds.

For stakeholders, data scientists can provide tangible evidence of how revenue and business value are being generated, and insight into how and where actions are necessary to sustain good levels of performance or to make improvements.

Clearly, there’s a role for data science throughout the software development lifecycle – so it makes operational and economic sense to have data scientists and developers working together amicably at all stages of the process.

Finding A Common Language or Environment

Harmony and collaboration between software engineers and data scientists may be desirable, but it’s not necessarily that easy to achieve. In part, this derives from actual differences in the way the two disciplines operate, and perceptions that the two groups may have about the way their counterparts think and work. This graphic from CodeMentor sums it up neatly:

(Image source: codementor.io)

You’ll notice that big data frameworks are a tool common to both disciplines – and in nearly all industries, data-driven intelligence is now an essential part of day-to-day operations, whether it’s used for supply chain management or personalized marketing. It therefore makes sense to establishing and use a shared set of tools and languages for developers and data scientists.

Setting Up A Data Lake

The construction of a data lake is one way of making production data from the development process readily available to data scientists and software engineers alike. This lake is a common pool of information, set up in an environment separate from the production platform. Because it will be a repository for information generated throughout the lifecycle, the data lake must have the potential to store vast quantities of records – so a dedicated data center or cloud environment is best.

The data scientists will decide on the best way for information to be stored and optimized for the queries they expect to run in the near term, and as future ideas develop. Since much of the data will come from the main application that the developers are working on, the teams will need to collaborate on finding the best ways for data to flow into the lake in its raw form.

This design process should take into account factors like the data, schema, level of data compression (if any), and whether information flows will stream in real-time or enter via scheduled dumps. Each team’s level of responsibility for monitoring data flows should also be established.

Making the Right Tools Available

Creating a common environment for developers and data scientists requires tools that enable them to work on the same data sets simultaneously, writing and sharing code or rich text. Notebooks make this possible.

For online operations, open source platforms like PixieDust (a helper library for Jupyter notebooks) enable developers to explore data analysis models without having to learn or code in the statistical languages favored by data science. Originally created by data scientists as one-off scripts, Jupyter notebooks also allow for the offline analysis of data sets and algorithms.

Monitoring and Evaluation

Throughout the software development lifecycle, data science algorithms must trace the path from raw data to interpreted information to some kind of value. Both the work of the data scientists and the developers has to be assessed and observed at all stages. And this observation and evaluation need to be built into the development environment from the beginning.

The very process of setting up this scenario creates opportunities for collaboration in and of itself. On the one hand, the software engineers get a chance to build a framework that embeds the work of the data scientists in a pipeline combining various datasets and algorithms. On the other, data scientists play an integral part in its construction by setting parameters and framing the right kinds of questions.

Using Data Scientists and Developers in Cross-Functional Teams

The final piece of the collaboration puzzle comes with the formation of cross-functional teams consisting of representatives from both camps.

For one thing, having data scientists embedded in a development team (or developers attached to a data science unit) fosters debate and active communication between the members. It also promotes understanding, allowing software engineers and data scientists to better appreciate the needs of each other. In addition, having mixed groups of professionals within the same unit enables those practitioners to step in immediately with their particular skill sets if issues or opportunities arise.

Business units with experience of working with cross-functional teams also stress the importance of allowing a degree of flexibility for the data science members (who may occasionally need to branch out and explore particular topics in isolation for a while), and of creating a forum where various teams can meet to share ideas and knowledge.

At the end of the day, the aim is to enable data science professionals and software developers to use their unique skills to the best advantage of the team – in an environment that promotes creativity, and where trust and respect can build up between the team members as new knowledge and insights are acquired and invested back into the product.

August 15, 2019

© 2020 Expeed Software