You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/modules/agent/architecture.md
+6-6
Original file line number
Diff line number
Diff line change
@@ -2,27 +2,27 @@
2
2
title: Architecture
3
3
---
4
4
5
-
## 1. Overview of InLong-Agent
5
+
## 1 Overview of InLong-Agent
6
6
InLong-Agent is a collection tool that supports multiple types of data sources, and is committed to achieving stable and efficient data collection functions between multiple heterogeneous data sources including file, sql, Binlog, metrics, etc.
7
7
8
-
### The brief architecture diagram is as follows:
8
+
### 1.1 The brief architecture diagram is as follows:
9
9

10
10
11
-
### design concept
11
+
### 1.2 design concept
12
12
In order to solve the problem of data source diversity, InLong-agent abstracts multiple data sources into a unified source concept, and abstracts sinks to write data. When you need to access a new data source, you only need to configure the format and reading parameters of the data source to achieve efficient reading.
13
13
14
-
### Current status of use
14
+
### 1.3 Current status of use
15
15
InLong-Agent is widely used within the Tencent Group, undertaking most of the data collection business, and the amount of online data reaches tens of billions.
16
16
17
-
## 2. InLong-Agent architecture
17
+
## 2 InLong-Agent architecture
18
18
The InLong Agent task is used as a data acquisition framework, constructed with a channel + plug-in architecture. Read and write the data source into a reader/writer plug-in, and then into the entire framework.
19
19
20
20
+ Reader: Reader is the data collection module, responsible for collecting data from the data source and sending the data to the channel.
21
21
+ Writer: Writer is a data writing module, which reuses data continuously to the channel and writes the data to the destination.
22
22
+ Channel: The channel used to connect the reader and writer, and as the data transmission channel of the connection, which realizes the function of data reading and monitoring
Copy file name to clipboardexpand all lines: docs/modules/agent/quick_start.md
+9-9
Original file line number
Diff line number
Diff line change
@@ -2,15 +2,15 @@
2
2
title: Build && Deployment
3
3
---
4
4
5
-
## 1、Configuration
5
+
## 1Configuration
6
6
```
7
7
cd inlong-agent
8
8
```
9
9
10
10
The agent supports two modes of operation: local operation and online operation
11
11
12
12
13
-
### Agent configuration
13
+
### 1.1 Agent configuration
14
14
15
15
Online operation needs to pull the configuration from inlong-manager, the configuration conf/agent.properties is as follows:
16
16
```ini
@@ -20,25 +20,25 @@ agent.manager.vip.http.host=manager web host
20
20
agent.manager.vip.http.port=manager web port
21
21
```
22
22
23
-
## 2、run
23
+
## 2run
24
24
After decompression, run the following command
25
25
26
26
```bash
27
27
sh agent.sh start
28
28
```
29
29
30
30
31
-
## 3、Add job configuration in real time
31
+
## 3Add job configuration in real time
32
32
33
-
####3.1 agent.properties Modify the following two places
33
+
### 3.1 agent.properties Modify the following two places
34
34
```ini
35
35
# whether enable http service
36
36
agent.http.enable=true
37
37
# http default port
38
38
agent.http.port=Available ports
39
39
```
40
40
41
-
####3.2 Execute the following command
41
+
### 3.2 Execute the following command
42
42
```bash
43
43
curl --location --request POST 'http://localhost:8008/config/job' \
44
44
--header 'Content-Type: application/json' \
@@ -78,7 +78,7 @@ agent.http.port=Available ports
78
78
- proxy.streamId: The streamId type used when writing proxy, streamId is the data flow id showed on data flow window in inlong-manager
79
79
80
80
81
-
## 4、eg for directory config
81
+
## 4eg for directory config
82
82
83
83
E.g:
84
84
/data/inlong-agent/test.log //Represents reading the new file test.log in the inlong-agent folder
@@ -87,7 +87,7 @@ agent.http.port=Available ports
87
87
/data/inlong-agent/^\\d+(\\.\\d+)? // Start with one or more digits, followed by. or end with one. or more digits (? stands for optional, can match Examples: "5", "1.5" and "2.21"
88
88
89
89
90
-
## 5. Support to get data time from file name
90
+
## 5 Support to get data time from file name
91
91
92
92
Agent supports obtaining the time from the file name as the production time of the data. The configuration instructions are as follows:
Copy file name to clipboardexpand all lines: docs/modules/dataproxy-sdk/architecture.md
+8-8
Original file line number
Diff line number
Diff line change
@@ -1,16 +1,16 @@
1
1
---
2
2
title: Architecture
3
3
---
4
-
#1、intro
4
+
## 1 intro
5
5
When the business uses the message access method, the business generally only needs to format the data in a proxy-recognizable format (such as six-segment protocol, digital protocol, etc.)
6
6
After group packet transmission, data can be connected to inlong. But in order to ensure data reliability, load balancing, and dynamic update of the proxy list and other security features
7
7
The user program needs to consider more and ultimately leads to the program being too cumbersome and bloated.
8
8
9
9
The original intention of API design is to simplify user access and assume some reliability-related logic. After the user integrates the API in the service delivery program, the data can be sent to the proxy without worrying about the grouping format, load balancing and other logic.
10
10
11
-
#2、functions
11
+
## 2 functions
12
12
13
-
## 2.1 overall functions
13
+
###2.1 overall functions
14
14
15
15
| function | description |
16
16
| ---- | ---- |
@@ -22,17 +22,17 @@ The original intention of API design is to simplify user access and assume some
22
22
| proxy list persistence (new)| Persist the proxy list according to the business group id to prevent the configuration center from failing to send data when the program starts
23
23
24
24
25
-
## 2.2 Data transmission function description
25
+
###2.2 Data transmission function description
26
26
27
-
### Synchronous batch function
27
+
####Synchronous batch function
28
28
29
29
public SendResult sendMessage(List<byte[]> bodyList, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
30
30
31
31
Parameter Description
32
32
33
33
bodyListIt is a collection of multiple pieces of data that users need to send. The total length is recommended to be less than 512k. groupId represents the service id, and streamId represents the interface id. dt represents the time stamp of the data, accurate to the millisecond level. It can also be set to 0 directly, and the api will get the current time as its timestamp in the background. timeout & timeUnit: These two parameters are used to set the timeout time for sending data, and it is generally recommended to set it to 20s.
34
34
35
-
### Synchronize a single function
35
+
####Synchronize a single function
36
36
37
37
public SendResult sendMessage(byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
38
38
@@ -41,7 +41,7 @@ The original intention of API design is to simplify user access and assume some
41
41
body is the content of a single piece of data that the user wants to send, and the meaning of the remaining parameters is basically the same as the batch sending interface.
42
42
43
43
44
-
### Asynchronous batch function
44
+
####Asynchronous batch function
45
45
46
46
public void asyncSendMessage(SendMessageCallback callback, List<byte[]> bodyList, String groupId, String streamId, long dt, long timeout,TimeUnit timeUnit)
47
47
@@ -50,7 +50,7 @@ The original intention of API design is to simplify user access and assume some
50
50
SendMessageCallback is a callback for processing messages. The bodyList is a collection of multiple pieces of data that users need to send. The total length of multiple pieces of data is recommended to be less than 512k. groupId is the service id, and streamId is the interface id. dt represents the time stamp of the data, accurate to the millisecond level. It can also be set to 0 directly, and the api will get the current time as its timestamp in the background. timeout and timeUnit are the timeout time for sending data, generally recommended to be set to 20s.
51
51
52
52
53
-
### Asynchronous single function
53
+
####Asynchronous single function
54
54
55
55
56
56
public void asyncSendMessage(SendMessageCallback callback, byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
Copy file name to clipboardexpand all lines: docs/modules/dataproxy/architecture.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,22 @@
1
1
---
2
2
title: Architecture
3
3
---
4
-
#1、intro
4
+
## 1 intro
5
5
6
6
Inlong-dataProxy belongs to the inlong proxy layer and is used for data collection, reception and forwarding. Through format conversion, the data is converted into TDMsg1 format that can be cached and processed by the cache layer
7
7
InLong-dataProxy acts as a bridge from the InLong collection end to the InLong buffer end. Dataproxy pulls the relationship between the business group id and the corresponding topic name from the manager module, and internally manages the producers of multiple topics
8
8
The overall architecture of inlong-dataproxy is based on Apache Flume. On the basis of this project, inlong-bus expands the source layer and sink layer, and optimizes disaster tolerance forwarding, which improves the stability of the system.
9
9
10
10
11
-
#2、architecture
11
+
## 2 architecture
12
12
13
13

14
14
15
15
1. The source layer opens port monitoring, which is realized through netty server. The decoded data is sent to the channel layer
16
16
2. The channel layer has a selector, which is used to choose which type of channel to go. If the memory is eventually full, the data will be processed.
17
17
3. The data of the channel layer will be forwarded through the sink layer. The main purpose here is to convert the data to the TDMsg1 format and push it to the cache layer (tube is more commonly used here)
18
18
19
-
#3、DataProxy support configuration instructions
19
+
## 3 DataProxy support configuration instructions
20
20
21
21
DataProxy supports configurable source-channel-sink, and the configuration method is the same as the configuration file structure of flume:
DataProxy provide monitor indicator based on JMX, user can implement the code that read the metrics and report to user-defined monitor system.
164
164
Source-module and Sink-module can add monitor metric class that is the subclass of org.apache.inlong.commons.config.metrics.MetricItemSet, and register it to MBeanServer. User-defined plugin can get module metric with JMX, and report metric data to different monitor system.
Copy file name to clipboardexpand all lines: docs/modules/manager/architecture.md
+5-5
Original file line number
Diff line number
Diff line change
@@ -2,19 +2,19 @@
2
2
title: Architecture
3
3
---
4
4
5
-
## Introduction to Apache InLong Manager
5
+
## 1 Introduction to Apache InLong Manager
6
6
7
7
+ Target positioning: Apache inlong is positioned as a one-stop data access solution, providing complete coverage of big data access scenarios from data collection, transmission, sorting, and technical capabilities.
8
8
9
9
+ Platform value: Users can complete task configuration, management, and indicator monitoring through the platform's built-in management and configuration platform. At the same time, the platform provides SPI extension points in the main links of the process to implement custom logic as needed. Ensure stable and efficient functions while lowering the threshold for platform use.
10
10
11
11
+ Apache InLong Manager is the user-oriented unified UI of the entire data access platform. After the user logs in, it will provide different function permissions and data permissions according to the corresponding role. The page provides maintenance portals for the platform's basic clusters (such as mq, sorting), and you can view basic maintenance information and capacity planning adjustments at any time. At the same time, business users can complete the creation, modification and maintenance of data access tasks, and index viewing and reconciliation functions. The corresponding background service will interact with the underlying modules when users create and start tasks, and deliver the tasks that each module needs to perform in a reasonable way. Play the role of coordinating the execution process of the serial back-end business.
Copy file name to clipboardexpand all lines: docs/modules/sort/introduction.md
+9-9
Original file line number
Diff line number
Diff line change
@@ -7,31 +7,31 @@ Inlong-sort is used to extract data from different source systems, then transfor
7
7
Inlong-sort is simply an Flink application, and relys on Inlong-manager to manage meta data(such as the source informations and storage informations)
8
8
9
9
# features
10
-
## multi-tenancy
10
+
## 1 multi-tenancy
11
11
Inlong-sort is an multi-tenancy system, which means you can extract data from different sources(these sources must be of the same source type) and load data into different sinks(these sinks must be of the same storage type).
12
12
e.g. you can extract data form different topics of inlong-tubemq and the load them to different hive clusters.
13
13
14
-
## change meta data without restart
14
+
## 2 change meta data without restart
15
15
Inlong-sort uses zookeeper to manage its meta data, every time you change meta data on zk, inlong-sort application will be informed immediately.
16
16
e.g if you want to change the schema of your data, just change the meta data on zk without restart your inlong-sort application.
17
17
18
-
# supported sources
18
+
## 3 supported sources
19
19
- inlong-tubemq
20
20
- pulsar
21
21
22
-
# supported storages
22
+
## 4 supported storages
23
23
- clickhouse
24
24
- hive (Currently we just support parquet file format)
25
25
26
-
# limitations
26
+
## 5 limitations
27
27
Currently, we just support extracting specified fields in the stage of **Transform**.
0 commit comments