Blogs
Card image cap
Blog
How AI Can Only Work When Humans Collaborate

Machines are more intelligent than humans; is this accurate? It depends. Researchers are currently developing algorithms and machine learning (ML) models within the narrow artificial intelligence framework (NAI), which outperforms humans in most specific cases (technically called targets).  

NAI candidate is a collection of ML models created and trained on historical behavioral data to learn patterns and causalities aiming to be used to solve a specific business problem. When trained and implemented in the automated pipeline, those models predict patterns and potentially impact decision-making become an AI process.  

 

The learning 

Events and objects are seen by babies and processed to create history and connectivity in their minds. Our memory stores events that allow the learning of logical connectivity between neurons. For machines, historical data replaces events and objects, which presents a substantial value of the machines given the current processing power that it can use to read and analyze years of data. On the other hand, humans need much more time to pass through the same events and objects. Moreover, a machine learning model can remember historical data patterns and make informed decisions based on what it has learned, which helps cover the needed touch points related to business processes for robust statistical learning. 

 

Challenges & Developments 

Data availability is one of the biggest challenges for ML model training. Good data governance initiatives for CSPs have a strategic added value for the adoption of AI initiatives. 

With a valid dataset, Data Scientists, with the help of subject matter experts (SMEs) in the industry specified, would develop and implement the appropriate model to solve the business problem in place, which will be later integrated by system developers for business decision making. 

How can we distinguish the "appropriate model" and its characteristics? Start by defining a model; it is a mix of mathematical and statistical formulas defined by types and parameters. Data scientists define types, while parameters are learned from the data. A model becomes appropriate when solving a business problem using the right data and is implemented, given its complexity and availability.  

 

Most of the used models fall under one of the following three types: supervised, unsupervised, and reinforcement learning. All types are designed to predict unseen events using seen/captured data. Thus, the second challenge of data scientists is to decide which model to use for each business problem. 

Without zooming in into statistical and mathematical assumptions, supervised models learn and memorize the historical patterns and co-occurrence of the outcome (also called a label) and derive the optimal statistical function that links the outcome to patterns. Unsupervised is the approach of learning data variations and classifying patterns into a measurable space. For example, to develop a similar root cause of network failures, an unsupervised approach would help. In contrast, a supervised approach is a way to learn the behavior of an outcome through patterns and variations of data captured, such as predicting the type of a network failure event based on what happened before that event. When it comes to reinforcement learning, which is an innovative approach in the data science space and heavily used in certain industries, including telecom, it consists of integrating a human in model tuning and adaptation following some changes in the data flow coming to the model. Theoretically, the human is called an "agent" with a specific gain and loss utility function and trying to direct the learning towards maximizing profit.  

 

Optimal way 

The optimal model is the one with higher statistical accuracy (defined as "the appropriate model"), and minimizes training efforts and implementation/integration time with the environment generating the data. This tradeoff discussion is highly essential and should be a joint discussion between a data scientist and SMEs from the early stages of project design and roadmap. We have seen a high percentage of failed AI initiatives within organizations due to the lack of communication from the start. Data scientists tend to use the challenging mathematical and statistical approach, while CSPs are looking for a final output with clear objectives to solve a specific business problem. 
 

B-Yond 

A practical example is B-Yond's Agility products, which are based on advanced supervised learning on top of a reinforcement approach by integrating an SME (agent) in the loop for continuous feeding of relevant information from the network to the model maximize the accuracy of detecting the root cause. B-Yond's configurable black-boxes of pre-trained models are also already optimized on both dimensions: fully automated with highly intelligent models but keeping the necessary model configurations parameters available for SMEs to control models' operational aspects. 

In the next series of blogs, we will go more into technical details of Agility product configurable black-boxes-AI models. We will also discuss Anomaly detection pre-trained robots that sit in CSPs environments 24/7 learning from data that report any issue and sometimes fix it automatically.  

LEARN MORE
Card image cap
Blog
Telcos Stance on Innovation; a Matter of Identity

Over the last two decades, I have co-founded several companies serving the telecoms industry, and in the process, have met with a majority of the executives from the top telcos around the world.

When you ask about how a telco perceives itself, you never get the same answer. It varies from one to another, sometimes even between one country’s operating business to another within the same group (depending on CEO’s leadership style).

Take the Spanish market, for example. Earlier this year I met with the CEO of Orange Spain who told me: “We are not a telco. We are a software company.” Orange Spain claims to have a large team of engineers and data scientists. It also has data science use cases developed in-house. Orange as a group has its own private cloud. It even has its own video conferencing system.

Contrast that to MasMovil, the fourth largest carrier in Spain, growing steadily through financial engineering and marketing as a beachhead. MasMovil started as an MVNO (with the acquisition of Yoigo) and grew through further acquisitions. MasMovil has a 35-year roaming agreement with Orange. Based on my multiple meetings with CEO Meinrad Spenger, I am not surprised. He is a lawyer, ex-McKinsey, IE MBA graduate and a strong negotiator.

Let’s look at Telefonica; Telefonica’s CEO who is a firm believer in artificial intelligence (AI) and continues to educate himself on the topic. Telefonica is the incumbent operator in Spain. Telefonica has invested in its own telco cloud architecture, known as Unica, has not given up on edge cloud against the webscale companies, and claims to have developed over 400 AI applications to support its business (with no confirmation about how many are operational). Its chief network officer is chairing an industry-wide edge cloud forum called GSMA Telco Edge Cloud, with representation from 22 international telcos. Its group CTIO recently signed an open RAN collaboration with Rakuten.

Take two of the US Tier 1 carriers as other examples:

  • AT&T has a long track record of innovation, starting with AT&T Labs (not everyone is old enough to remember Daytona). A few years ago, AT&T evolved its innovation strategy and initiated an open source strategy, launching initiatives such ONAP, Airship and Akraino, which are more collaborative innovation strategies leveraging contributions from open source communities like Linux Foundation and Openstack. Today, AT&T is also leading initiatives to open up the network and, for instance, is heavily committed to O-RAN, along with other key carriers like Deutsche Telekom, Telefonica and Rakuten.

  • Sprint outsourced its entire network to Ericsson at some point in time, probably the largest managed services contract ever awarded in the US. A few years later, Sprint reversed that decision and I was one of some 40 telco execs invited to discuss plans forward after the Softbank acquisition. There were hints of massive investments in the network. These plans were later scrapped, and we all know the rest of the story: the T-Mobile acquisition.

Finally, there are many examples around the world of telcos innovating to create new revenue sources. For example, Telus stepped into data centers and healthcare. Businesses contributed a billion dollars plus towards new revenue streams for their top line, transforming operational models. I am especially impressed by Deutsche Telekom’s recent push to transform its next generation IMS — telco cloud automation which has been recognized by the World Communication Awards for “Best Network Transformation Initiative”. There are also carriers such as the Finnish company, Elisa, whose ambition goes way beyond its home market, and who acquired Swedish company, Polystar, which focuses on network analytics.

So, different telco executives have varying views about their company’s identity, and the role of innovation within their organization. That view evolves, too, with time.

Then there are outsiders trying to bring innovation to the telco space.

“Different telco executives have varying views about their company’s identity and the role of innovation within their organization. That view evolves, too, with time.”

My company was one of seven co-founders of the Facebook Telco Infra Project (TIP). When we attended the first meeting, I wondered if Facebook was planning to take over the telco business in a few years, based on the ambitions laid out. Later, it became clear Facebook was trying to create a forum to advance innovation within telcos, which would not only benefit the telcos but also certainly indirectly benefit its own business (i.e., connecting the unconnected, hence, selling more ads). Today, TIP is highly focused on O-RAN (disaggregated RAN network) as an alternative presented to carriers looking to replace an extremely consolidated market of traditional RAN vendors in their networks (especially in the light of Huawei rip out).

LEARN MORE
Card image cap
Blog
Why Are You Putting The Cart Before The Horse?

The Telecom industry has touted the level of mobile product enablement that will come with 5G. Low latency, fast data speeds and, highly reliable mobile connectivity will open the door to applications from enterprise-grade fixed wireless connectivity, augmented reality to autonomous cars, and even remote surgery. More importantly, 5G is about the cloudification of the telco infrastructure and opening the Edge to innovation and developer communities. All these applications have one key hurdle in common: Will wireless connectivity provide the level of reliability to support these performance-sensitive applications? 

We go about our daily routines using products where our trust in their reliability means that we are betting our lives on them. Planes, cars, surgical equipment (etc.), the way they’ve arrived at this level of trust is rigorous testing. Producers of these products and services have perfected testing in two ways: first, the ability to simulate the real-life application of the product. Second, the use of extensive testing platforms that allow for replicating and testing every scenario. 

Telecom testing processes today are, unfortunately, not at that “bet-my-life” level. Labs are a limited replica of production. Test validation processes are highly manual, costly, and slow. Network slicing promises to address the former to some degree. However, the processes that achieve full automation of network functions and service orchestration while maintaining true network resource isolation have yet to attain enough maturity to address the challenge in the short term. 

“Labs are a limited replica of production.
Test validation processes are highly manual, costly, and slow.”

But there is hope. The technology already exists to solve for much of this. The first step is the creation of production service replication in test-beds automatically, on-demand, including decommissioning once the service is verified. This, first and foremost, is dependent on CI/CD pipelines that not only automate DevOps processes but integrate detailed testing pipelines through, what we call, Continuous Testing and Continuous Validation (CT/CV) pipelines. The CT/CV pipeline is an end-to-end test automation framework that includes: classifying the service call flow patterns through implementing machine learning, closing the loop on test execution, and allowing for an exponential increase in the level of detectable call flow variations that is virtually independent of any human intervention. 

What we know from the millions of call-flows we have processed, is that lab and production traffic can be similar, meaning that the resulting insights are highly portable. By creating the right test batteries, combined with the automated network replication processes mentioned earlier, including CT/CV, our pre-production certification process not only produced much more reliable network functions and services but our Machine Learning (ML) models trained to recognize call flows in pre-production were reused in production. 

The results of the business are impressive. Applying preventive measures before production rollout means that the quality of the software is at a much higher level than before, reducing the number of consumers impacting events down the road. 

If we plan to open up our networks to innovations from third-party developers and performance-sensitive applications, we would have to begin our journey towards uncompromised quality before services hit the consumers. Consumers cannot be guinea pigs; we have machines for that. 

 
LEARN MORE
Card image cap
Blog
Let The Androids Be Androids

We often discuss similarities between Machine Learning (ML) and Human Intelligence. After all, it is “Artificial” intelligence, right? So, why wouldn’t you? To better understand the value of applied ML, you could instead take the opposite approach and look at how ML differs from us humans.

The nature of ML is that it is always learning, always sharing, never forgets, and never leaves for another job. Contrast this with an emotional human. We are not always in the mood to learn. We keep our knowledge close to the vest due to fear of becoming obsolete, we are forgetful and, the grass is always greener on the other side of the fence, so we leave for another job. Eventually. Think about it. Just the cost of poor knowledge sharing is huge. It slows things down further, reduces the quality of output, it is hard to replicate and repeat the process. Then the person leaves and takes all that knowledge with them. This churn adds additional costs in the form of recruiting efforts and even slower execution (after all, you just lost a team member). Fortunately for us, the current state of AI cannot even begin to compete with human intelligence. There are still suitable applications for AI/ML though. Take technology testing for example. Here we deal with repetitive (some would say boring) tasks. This accelerates churn further. It is better to delegate it to a machine and free up us humans work on the fun stuff, like research and development!

 

COLLECT AND KEEP THE INTELLIGENCE

The “always learning” part of ML can create a perception that the ML-based solution is not fully baked. But the solution is fully baked. It is the world around us that is “baking” (changing). Everything evolves, changes, transforms. Even a system under test. To allow ML models to adapt, we allow users to provide feedback to the system. In AI-lingo, it is called reinforcement training. In the past, with rules-based testing software, you would have to modify or add a rule. If you are lucky, that is. Often, you would have to develop and deploy a software upgrade. With an ML-powered solution like B-Yond Agility, the user can provide feedback, which further improves the fidelity of the system. In other words, the tribal knowledge that normally would reside with one single engineer (the user in this example) is now part of the system and shared with everyone. It becomes part of the collective IQ of the ML-supported processes.

Early on in the B-Yond journey, when we applied Artificial Intelligence (AI) to the analysis and root cause phases of testing, we had a hunch that machine learning (ML) was a suitable approach because of two distinct properties: (1) ML can process massive amounts of data and, (2) ML provides predictions based on data patterns instead of binary conclusions. While both properties are important, the latter is unique to ML and the driver behind why we have been able to solve the “Two-Third Dilemma” of the technology life cycle. This dilemma is that, while test execution can be automated (the first third), the results analysis and root cause determination (the other two thirds) require a lot of human-like processing. Things are not black-and-white, one test run may not be identical to the next, what we concluded last time only applies partially to the next, and so on. ML deals with patterns and offers a prediction with an assessment of a pattern’s resemblance to something that has been seen before. In that regard, ML resembles human thinking.

“ ML deals with patterns and offers a prediction with an assessment of a pattern’s resemblance to something that has been seen before. In that regard, ML resembles human thinking. ”

 

GO FASTER

We believe that solutions, like B-Yond Agility, will ultimately make life better for people by providing a complementary role that was never done well by humans in the first place. Let’s look at this in more detail. Let’s look at how Agility applies ML to Continuous Integration, deployment, testing, and validation (CI/CD, CT/CV – Check out the white paper here).

Agility reduces test result validation, failure analysis, and root cause from days to minutes. This reduction in the technology life cycle has profound implications on its own. It eliminates the long pole of go to market for new services and applications. The business benefits are many. Reduced test costs, improved quality, first-mover advantage, accelerated time to revenue, faster feedback on the success of a new feature. The list goes one. Ok, so an ML-powered solution like Agility is faster than a human because of the way AI is different than humans. What else?

 

REDUCING RELIANCE AND CLOSING THE LOOP

With ML applied to CT/CV in production environments, there is tremendous value beyond just accelerating issue triage. You can also reduce your reliance on support from equipment vendors because the ML is now handling the issue triage. Once you gain confidence in the predictions, you can begin to automate the operation towards a closed-loop, self-healing system. It sounds like a pipedream, perhaps? It is closer than you think.

If you are interested in learning more, one great way is to schedule a demo. It is easy. Contact us here!

 
LEARN MORE
Card image cap
Blog
Zero Tolerance For Service Interruption

Over the last number of years, CSPs have been on a transformational journey to become more customer focused in order to remain relevant and competitive in their markets. The need for a customer-centric approach is heightened by decreasing profitability, subscriber base saturation, and competition from OTT players. This has been further compounded by the Coronavirus pandemic as working, schooling, and entertaining from home have become common. The internet connection into our homes or onto our mobile devices has transitioned rapidly from being a nice-to-have to being a necessity. All of this means that in general, but especially in this Covid-19 world, customers have zero tolerance for any service interruptions or outages.

Even after all the effort and investments made, CSPs still face a fundamental problem. Network-centric information such as performance counters and network alarms are still predominantly the main source for detecting, analyzing, and troubleshooting customer and service impacting issues.

Monitoring network-centric information to troubleshoot customer and service-centric problems leads to a couple of fundamental issues:

  • In many situations, the network KPI’s say that everything is working fine, but customers are still complaining via customer care and social media channels. Even worse a percentage is churning because of their poor quality of experience.

  • Network Operations Centers (NOCs) and Service Operations Centers (SOCs) are flooded with alarms. There are simply too many to effectively manage, leading to delayed reactions or completely missed incidents.

  • There is no way to determine the customer impact of network incidents in realtime, resulting in investing time and effort into fixing problems that have limited or no impact on customers. In parallel, actual customer-impacting issues are going unnoticed and unresolved.

In addition, the current model of monitoring network-centric information to troubleshoot customer and service impacting issues is a break/fix model and therefore completely reactive. 

Often customers have already been impacted by the time a problem is discovered.    Maintaining the lead in the market is contingent on how fast a CSP can respond to their customer’s needs and meet their customers’ expectations.  This is where an AI-driven predictive maintenance solution that correlates data points from traditionally siloed data sources can bring tangible benefits. Leveraging AI, CSPs can transform the operation of the network from a  network-centric operating paradigm to a customer and service-centric operating paradigm which puts customer and service impact at the center of every technical and business decision.

The ultimate goal of an AI-driven predictive maintenance solution is to ensure that the end customers are happy with their experience of using the network and services delivered across it.  If customers are happy, they will remain loyal and become a promoter of the CSP’s brand. In order to achieve this goal, it is necessary for a CSP to:

  • Understand customers’ real experience and real perspective of using services, monitoring that continuously, and raising alerts when there are service quality degradations

  • Correlate service degradation against network alarms, performance counters, configuration management and other relevant sources of information in order to understand where the problem is occurring

  • Learn the patterns from past incidents and apply those learning to future incidents

  • Understand actual and predicted customer impact of network problems and when there are multiple problems, it is critical to prioritize those that truly impact or have the potential to impact customers and revenues

  • Automate the root cause analysis of customer-impacting issues, using AI to provide recommendations on how those issues can be resolved and, where possible, automate the resolution to those issues back into the network before customers even realize they have an issue

  • Leverage AI and ML to predict issues before they occur based on the learning from past events which enables a CSP to become truly proactive

CSPs need a solution, tailored for their specific requirements and operating environment,  that can continuously monitor the end-to-end service quality and performance and raise alerts in real-time to NOC and SOC users whenever a service degradation occurs. By using re-enforcement learning, the AI/ML algorithms can learn which alerts are important and which are noise, to ensure that the NOC and SOC users only have visibility of the service degradations that are important to them and that are really impacting the customer experience.

Once a  service degradation is detected, the next step is to quickly understand which customers, services, and network elements or locations are impacted. In addition, understanding the potential revenue impact allows the CSP to prioritize which service degradations should be actioned first.  More importantly,  AI/ML really excels at performing automated root cause analysis and determining a recommended course of action.

Automated root cause analysis and determination of the recommended action require the correlation of customer and performance information with clear codes, network performance counters, network alarms, etc.  in order to decipher the relationships and patterns between these traditionally disparate data sets. Once a recommended action has been determined and prioritized, remedial actions can be automated back into the network to fix problems before customers even know they have a problem.

Finally, the AI/ML algorithms embedded in the  AI-driven predictive maintenance solution can be used to predict future service degradations and future customer experience impacts based on historical data signatures. This enables a CSP to get ahead of the curve and become truly proactive, resolving service and customer impacting issues before they even occur or become pervasive.

The business benefits of an AI-driven predictive maintenance solution are numerous.  Improved customer experience means loyalty and brand equity which leads to higher retention rates and customers becoming referral ambassadors. Improved operational efficiency leads to improved service reliability and results in a CSP being able to do more with less. Finally, automation of issue resolution through the integration of recommended actions to 3rd party systems, where feasible, means faster resolution of customer and service impacting issues which ultimately leads to lower operating expenses.  In these worrying times, all this will ultimately establish trust between you and your customers, dramatically increase the likelihood they will recommend you to their friends, and ensure their loyalty long into the future.

 
LEARN MORE
Card image cap
Blog
Come on! Break the Rules!

For almost two decades, I have been exploring and experiencing the phenomena we call “Silicon Valley”. It has been a privilege, it has been exciting, and it has been a roller coaster. The diversity of people here is unprecedented. There are incredibly successful companies, and there are spectacular failures. You will see extreme wealth and equally extreme poverty. The common theme is “extreme”. However the sausage is made, many of the companies coming out of the valley go on to become icons. And, many more don’t.

 

What is the recipe for this sausage then? How do we even define success? While it is a relative measure, I think success is anything that we can take personal pride in. I have noticed some recurring traits in successful people here. They tend to think “out of the box” (or, “think different” as Steve Jobs would say). They break rules. Constantly. Anything ripe for disruption is a fair game. Not always with the desired result of course. We move fast and we break things. We celebrate success as well as failure. We are great storytellers (the “pitch” is so important!). We never ask for permission and sometimes have to ask for forgiveness later. Sometimes it produces unicorns. A lot of times, it produces new cool products or new business models. And many times it produces duds. There is this maverick attitude. Rules are meant to be broken. When success eludes us, it most often means time and money lost. Sometimes though, it goes horribly wrong with impact to health and even the lives of people. You may have heard the phrase “fake it until you make it”. You may think it is a fancy way to justify lying but it really isn’t. “Faking it until you make it” can be OK when you deal with software when you have a great idea, a viable path to implement it and a wickedly smart and resilient team. Eventually, with enough money and grit, you will figure it out. Not all industries are suitable for this approach. Take Theranos, a healthcare startup, based purely on an idea. A great pitch with no solution. This is when Silicon Valley goes wrong by substituting charisma-and-pitch for solution-and-execution. The bottom line, execution is key. But, not even the greatest idea and impeccable execution can guarantee success. Timing is equally important. Unfortunately, we can never quite determine timing. Some people call it “luck” instead. Maybe that is a more accurate term, actually. So for Silicon Valley, it is really a numbers game after all.

With that, I am going to break some rules and cut a double black diamond by foot. Never did that before! Wish me luck!

 

 

 
LEARN MORE
Card image cap
Blog
Growing fangs: Immutable workloads and the transforming telco

FANG – Facebook, Amazon, Netflix and Google – is taking a bite out of traditional service providers’ business. By competing with telecommunication service providers (telcos) to create and optimize the next generation of networks – including the highly anticipated 5G – these web-scale disruptors are outpacing their predecessors with highly scalable infrastructure.

In my colleague Rikard Kjellberg’s article, he referenced the pets vs. cattle analogy to describe the key advantages of running a telco ranch – and while FANG is well on its way to being ranchers, operators are still struggling to break out of their local wireless pet stores.

Telecommunications is Big Business. More than four cents out of every dollar of economic activity is related to telecommunications according to GSMA. With all of its apparent inefficiencies, the industry continues to grow at the relatively healthy pace of around 2.4 percent per year across the world.

Compared to service providers in the web-world, that growth is pretty slow. Just a quick look at FANG ?reveals what is going on. Google, the slowest among them, is still growing at a rate of over 15 percent year-over-year, or roughly six times the growth of telcos, according to NASDAQ.

Just what is the fundamental difference between so-called “web-scale” companies and the big players in the telecommunications world? That question generates many answers. Common reasons given include regulated markets, demanding service quality expectations, backwards compatibility requirements, size of the existing customer base, and more.

It is certainly true that these factors have a serious impact on the telcos’ ability to innovate. I submit, however, that there is one fundamental difference between the telcos and their web-scale challengers that is not so easily brushed off.

No need for drum-rolls. It’s pretty straightforward.

Since the gradual break-up of telecommunications monopolies across the world in 1980s and 1990s, the telecommunications industry has been split between network equipment manufacturers and network service providers. This split resulted in keeping both types of companies very dependent on the abilities of the other type.

A critical deficiency shared by network service providers is an inability to retain any technical edge over other service providers. Meanwhile, over on the web-scale side, Google’s PageRank, Amazon’s ShoppingCart, and Facebook’s subscriber database are just some examples of what Warren Buffett calls the insurmountable moats around their castles of business. These are technologies they have developed, nurtured, and continue to use to outrank, and ultimately dominate, the rest of the industry.

On the telecommunications side, network service providers have decided to outsource their most critical components of revenue generation to network equipment manufacturers. (I use this term in the most generic sense, to include hardware, software, firmware, and other manufacturers.)

Things have been this way for a long time. AT&T hired Lucent and Northern Telecom; British Telecom (BT) retained Marconi; Deutsche Telekom (DT) outsourced to Siemens, and there are many more examples. When times were good?—?in other words, when network service providers held market dominance (near-monopoly) in their respective markets?—?this model worked extremely well and helped both the service provider and the equipment manufacturer to thrive. Once the de facto monopolies began to disintegrate however, network service providers decided that open standards would be the solution to their problems. Open standards would allow them to substitute products from one network equipment manufacturer with those from another.

While open standards provided them with greater leverage over manufacturers, the decision also caused the now-fundamental weakness of the industry: everything is now expandable, and no company can dig a moat deep and wide enough to adequately protect their business.

Currently, the industry is trying to solve this problem through consolidation. This explains the multi-country alliances of service providers such as Vodafone, Orange, and DT. It also accounts for their desire to diversify business across various market segments, including wireless, enterprise, entertainment, and so on.

On the equipment manufacturer side, consolidation results in a decreasing number of large companies capable of providing solutions to service providers in multiple market segments across the globe. These attempts are certainly well-intentioned but they seem somehow limited in their success so far.

Just one of the unbreachable moats that web-scale companies construct around their businesses is building infrastructure capable of serving very large numbers of users and, more importantly, doing so with the smallest possible staff.

Let’s look at an extreme example. When WhatsApp was sold to Facebook in 2014, they proudly claimed to serve 40 million subscribers with just one engineer. This was comparable to AT&T and Verizon running their current networks with, give or take, four engineers each. The absurdity of this comparison nonetheless prompted Facebook to pay almost $20 billion for WhatsApp. Other success stories, though to a lesser extent, are told by FANG and other web-scale companies.

To put it as simply as possible, large web-scale companies overcome the challenge of massive scaling by distributing large problems over a very large number of identical components. In other words, they serve those billions of customers using a huge number of identical servers. This requires an amazing level of discipline when choosing the perfect server to do the work, using the right software to manage server lifecycles, and ultimately automating when, and how many, servers are deployed and deleted.

This is why Google invented container technology for their workloads over a decade ago, and very properly named the project “Borg.” This is also why, as of 2016, Google was deploying and deleting one billion containers every week. I doubt that many containers had been deployed in the entire history of the telecommunications industry up to that point.

The reason that Google and other web-scale companies can so easily perform so many deploy/delete functions is because they manage their (virtual) servers as immutable (or unchanging) components. They don’t service, repair or update their chosen servers while the servers are still running. If they find a problem with the server version, they quickly generate a newer version, deploy it on a limited scale for testing then deploy it massively.

In keeping with the “pet-cattle paradigm,” these are the best types of cattle: the kind that, based on a single blueprint, emerge from a breeding farm at any given time. Why aren’t telecommunications operators tending similar herds of cattle? The answer is found in the fact that the telecommunications industry is built on common interface standards.

While common interface standards provide great comfort and flexibility to the consumer, they tend to make service differentiation very difficult. Operators end up being forced to outshine each other in terms of their financial statements (drive for cost reduction), while also having to focus on implementation differentiators. As a result, telecommunications networks are full of purpose-built pets needing rigorous management.

The times, however, are changing. We are now seeing operators experimenting with open hardware platforms and open system software solutions growing more popular through organizations like The Linux Foundation. In recent months, leading vendors have begun promoting cloud-native network functions.

It is quite likely that, given another few years, telecommunications operator networks will begin using a large number of similar immutable network functions and ditching their pets altogether. Built as software running on a common hardware, they will provide specific network functions right up until they need to be replaced en-masse. It will be a world very different from the one in which we now live, but it will be essential to fulfilling the promise of 5G. By adopting some of the practices of web-scale companies, telcos might once again transform networks into differentiating infrastructures and with a bit of luck, enjoy some renewed days of glory.

 

LEARN MORE